To date, the use of complex graph models has been limited
by three interrelated factors: the complexity of realistic
models, paucity of empirically relevant simulation studies, and
a poor understanding of the properties of inferential methods.
In this talk we discuss solutions to these limitations. We
emphasize the important of likelihood-based inferential
procedures and role of Markov Chain Monte Carlo (MCMC)
algorithms for simulation and inference.
Motivated by our current research on visualizing consumer market
structure using graphs we are in the process of implementing R
functions that allow for trellis-style displays with a graph
layout. While standard trellis graphics use a rectangular grid for the
panels, we use the nodes of a graph (projected into 2-dimensional
space) as positions for the panels. In addition, each edge of the
graph is also drawn using a panel function.
In the simplest case, the node panels draw a circle with the name of
the node, and the edge panles draw lines or arrows to give the
standard picture of a graph. However, arbitrary panels from the
lattice package can be used, e.g., to draw a barplot or scatterplot in
each node of the graph. While interactive tools like ggobi are much
better suited for exploratory analysis, we aim at publication-quality
figures capable of visualizing data with an underlying graph
structure of moderate size.
As an example we use marketing data on product perceptions. Each node
of the graph represents a cluster of perceptions, edges connect
neighboring clusters. For each node we have several background
variables like distribution of brands, sales, time or
sociodemographic data of customers. Differences between the market
segements are tested using permutation tests and visualized using the
framework described above.
A social network caught in the Web
Lada Adamic, Orkut Buyukkokten and Eytan Adar
We present an analysis of Club Nexus, an online community at Stanford
University. Through the Nexus site we were able to study a reflection
of the real world community structure within the student body. We
observed and measured social network phenomena such as the small world
effect, clustering, and the strength of weak ties. Using the rich
profile data provided by the users we were able to deduce the
attributes contributing to the formation of friendships, and to
determine how the similarity of users decays as the distance between
them in the network increases. In addition, we found correlations
between a user's personality and their other attributes, as well as
interesting correspondences between how users perceive themselves and
how they are perceived by others.
Statistical Inference for Large Directed Graphs with Communities of
Interest
Deepak Agarwal
Inference for edge data in large directed graphs is computationally
challenging since one has to account for dependencies that exist between
edges. We describe a framework for scaling the computations by using
"Communities of Interest" recently introduced in the literature (Cortes,
Pregibon and Volinsky). A Community of Interest is a small subgraph centered
around a node and is
assumed to capture influence the node has on the entire graph. This
approximation enables us to work locally with a large number of small
subgraphs whose union constitutes the entire graph. Inference for each
subgraph is done using Bayesian Stochastic Blockmodels.
Degeneracy and Inference for Social Networks Models
Mark S. Handcock
We consider statistical and stochastic models for
random networks that can be used to represent the structural
characteristics of the networks. In our applications, the
nodes usually represent people, and the edges represent
a specified relationship between the people.
Graph theory and statistical inference considerations for protein-protein
interaction and gene expression data
Denise Scholtens
Coordination of gene expression data, protein-protein interactions, and
other high-throughput information about the properties of the cell is
fundamental to bioinformatics research. We pose several recent
bioinformatics problems in terms of graphs and emphasize some statistical
principles that should be considered when doing inference on multiple
graphs. We also demontrate the use of the graph and Rgraphviz packages in
Bioconductor as visualization and analysis tools for bioinformatics graph
data types.
Graphical Representations of Knowledge and Its Distribution
Cliff Behrens
We are currently exploring the use of graphs to visualize
knowledge and its distribution among subject matter experts (SMEs) as
a means of improving collaborative modeling and information discovery.
This talk will report on two related applications. The first
describes research for DARPA that builds knowledge "contour" maps from
similarities among SME responses and estimates of their knowledge in a
domain. This map motivates "knowledge-based" collaboration and the
use of new collaboration tools by revealing potential advice-giving
relationships among SMEs. The second application involves creating
visualizations of semantic neighborhoods for a user's query in vector
spaces computed independently with Latent Semantic Indexing (LSI). In
this case, graphics are provided to help a user decide whether a
particular vector space contains appropriate context for a query.
Graphics can also yield new insights about terms that contribute
meaning to core concepts in a knowledge domain. Some background and
example visualizations will be presented for each application.
Trellis-Style Displays with a Graph Layout in R
Friedrich Leisch
Dynamic Network
Visualization: Methods for Meaning with Longitudinal Network
Movies
James Moody, Daniel A.
McFarland, and Skye Bender-DeMoll
Increased
interest in longitudinal social networks and the recognition that
visualization fosters theoretical insight creates a need for dynamic
network visualizations, or network "movies." The successful
development of network movies requires confronting a number of
theoretical questions surrounding the temporal representations of social
networks and technical questions about how best to link network change to
changes in the graphical representation. We divide network movies into
two major classes: static
flip books
where node position
remains constant, but edges cumulate over time and dynamic
movies
where nodes move as
a function of changes in relations. Static graphs are particularly useful
in contexts where relations are sparse. When the network is more
connected, movies are often more appropriate, and the bulk of our
discussion focuses on techniques and challenges associated with
developing meaningful dynamic network movies. We explore the returns to
different movie styles using three empirical examples. A new software
program for creating network movies is discussed in the appendix.
A Brief Guided Tour of the Java Universal Network/Graph Framework
Scott White
Very large networks with arbitrarily rich attribute structure have
become increasingly common in recent years. Examples include
collaboration networks, protein interaction networks, and
telecommunication networks. The Java Universal Network/Graph Framework
(JUNG, sourceforge.net/projects/jung/) provides a general JAVA-based software development framework
designed to support the modeling, analysis, and visualization of data
that can be represented as graphs. In this brief talk, I will give an
overview of JUNG and show how it can be used to support various tasks and
queries (e.g. clustering, ranking, filtering, visualization, etc.) that
are common to large-scale network analysis.