I am a member of the Statistics Research Department at AT&T Labs in Florham Park, NJ, where I work on hierarchical Bayesian modeling, MCMC methods, hidden Markov models, and other topics related to applied statistics.
4/25/2013: My paper with Howard Karloff, "Maximum Entropy Summary Trees", has been accepted for publicaion at EuroVis 2013. I'm really excited about this work: it is an algorithm for summarizing the structure of a large, rooted, node-weighted tree that leads to nice visualizations. We define a "summary tree" as an aggregation of the nodes of original tree subject to certain constraints. Then, our algorithm computes the maximum entropy summary tree, where we define the entropy of a node-weighted tree as the entropy of the discrete probability distribution whose probabilities are the normalized node weights. The result is a way to visualize a 100-node summary, for example, of a really huge tree (which might have had 500,000 nodes to begin with), where this particular 100-node summary is by definition the most informative such summary (according to entropy) among all possible summaries of the same size. Sequentially viewing the maximum entropy k-node summary trees of size k = 2, 3, 4, ..., 100 is a really nice way to visually do some EDA on large, hierarchical data.
Here is a link to the paper and to the webpage for summary trees, which includes more discussion and the supplementary material for the paper (an appendix + some examples). My plans for the next steps include an R package and a d3 implementation.
Below is the 56-node maximum entropy summary tree of the Mathematics Genealogy tree rooted at Carl Gauss (forced to be a tree by removing all but the primary advisor of each student), which has over 43,000 nodes in its original form.
For older news, click here.
Shirley, K.E., Small, D.S., Lynch, K.G., Maisto, S.A., and Oslin, D.W. (2010). Hidden Markov models for alcoholism treatment trial data. Annals of Applied Statistics, Vol. 4, No. 1, 366-395. [pdf]
Jensen, S.T., Shirley, K.E., and Wyner, A.G., (2009). Bayesball: A Bayesian hierarchical model for evaluating fielding in major league baseball. Annals of Applied Statistics, Vol. 3, No. 2, 491-520. [pdf]