
Professional Life:
Xin (Luna) Dong
lunadong@research.att.com
Data Management Dept
AT&T Labs--Research
Bld 103, Rm B281
180 Park Ave.
Florham Park, NJ 07932
Tel: (973)360-8508
Fax: (973)360-8421
|
Xin (Luna) Dong
I am currently a researcher in the Data Management Department at
AT&T Labs - Research. I received my Ph.D. in
Computer Science and Engineering at
Univ. of Washington. Before coming to the United States, I
obtained a M.S. in Computer Science at
Peking University, and a B.S. in Computer Science at
Nankai University in China.
You can find my C.V.
here.
Research Interest
The goal of my research is to help people organize, access, and
search information effectively and efficiently. My research aims at
answering the following questions:
- The amount of information produced in the world increases by
30% every year and this rate will only go up. On the one hand,
how can we take the full advantage of the abundant information
to improve quality of data? On the other hand, how can we help
users find the answers that address their questions without
overwhelming them by the huge amount of relevant information?
- Nowadays more and more data-management applications need to
integrate a large volume of heterogeneous data, but data
integration often requires significant upfront efforts and
technical expertise. How can we enable data sharing with a
minimum cost and how can we still guarantee quality of the
shared data?
- Data exist in various forms: structured, semi-structure, and
unstructured. How to realize the full potential of structure
explicitly or implicitly existing in data to best fulfill
people's information needs?
- The world contains rich associations: between real-world
entities, between data sources, and between data items. How to
discover and leverage such associations to facilitate search?
- How to help non-technical-savvies to best organize, access,
and understand information?
In particular, my current research interests include
- Data integration
- Data cleaning
- Web search
- Record linkage
- Personal information management, community information
management, enterprise data management.
Projects
Currently my research focuses on two directions:
source-dependence discovery for data cleaning and data integration,
and data integration with uncertainty.
Solomon---Detecting
dependence between data sources
The Internet accelerates the rate of information being
produced and eases duplication and transmission of data across
data sources. Solomon aims at discovering dependencies between
data sources and leveraging such knowledge for deciding truth
from conflicting information and for efficiently answering
queries over a set of data sources. [Vision
paper][Talk]
UDI---Data
Integration with Uncertainty
As the scope of data integration applications broadens, data
integration systems need to handle uncertainty at various levels
and do so in a principled fashion. Our goal in the project is to
build a system that can quickly provide search services over a
set of heterogeneous data sources without requiring strenuous
work on setting up the precise mappings between the sources. [Talk][DBClip]
My recent projects include
-
Semex---Personal information management system
-
Woogle---Web-service search engine
-
Piazza---Peer data management system
Selected Publications
You can find the full list of my publications
here and my DBLP entry
here. Below is a list of selected papers categorized by research
area.
- Data fusion and source dependence
- Xin Luna Dong and Felix Naumann. Data
fusion--Resolving data conflicts for integration. In
VLDB, 2009. [PDF][Presentation]
- Xin Luna Dong, Laure Berti-Equille, and Divesh
Srivastava. Integrating conflicting data: the role of
source dependence. In VLDB, 2009. [PDF][Presentation]
- Xin Luna Dong, Laure Berti-Equille, and Divesh
Srivastava. Truth discovery and copying detection in a
dynamic world. In VLDB, 2009. [PDF][Presentation]
- Laure Berti-Equille, Anish Das Sarma, Xin Luna Dong,
Amelie Marian, and Divesh Srivastava. Sailing the
information ocean with awareness of currents: discovery
and application of source dependence. In CIDR,
2009. [PDF][Presentation]
- Dataspaces
- Daisy Zhe Wang, Xin Luna Dong, Anish Das Sarma,
Michael Franklin, Alon Halevy. Functional Dependency
Generation and Applications in Pay-As-You-Go Data
Integration Systems. In WebDB, 2009. [PDF]
- Anish Das Sarma, Xin Dong, and Alon Y. Halevy:
Bootstrapping Pay-as-you-go Data Integration Systems. In
SIGMOD, 2008. [PDF]
- Xin Dong, Alon Y. Halevy and Cong Yu: Data
Integration with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion]
- Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In
SIGMOD, 2007. [PDF][Presentation]
- Jing Liu, Xin Dong and Alon Y. Halevy: Answering
Structured Queries on Unstructured Data. In WebDB
2006. [PDF][Presentation]
- Xin Dong, Alon Y. Halevy and Jayant Madhavan:
Reference Reconciliation in Complex Information Spaces.
In SIGMOD 2005. [PDF][Presentation]
- Personal information management
- Yuhan Cai, Xin Dong, Alon Y. Halevy, Jing Liu and
Jayant Madhavan: Personal Information Management with
SEMEX. SIGMOD Demo 2005. (BEST DEMO, one of
three top demos)[PDF][Presentation]
- Xin Dong and Alon Y. Halevy: A Platform for Personal
Information Management and Integration. In CIDR
2005. [PDF][Presentation]
- Misc
- Xin Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes
and Jun Zhang: Similarity Search for Web Services. In
VLDB 2004. [PDF][Presentation]
- Xin Dong, Alon Y. Halevy and Igor Tatarinov:
Containment of Nested XML Queries. In VLDB 2004.
[PDF][Presentation][Tech-report]
Talks
- Data fusion--Resolving data conflicts for integration. [PPT]
- NDBC tutorial, Nanchang, China, October 2009. [PPT]
- VLDB tutorial, Lyon, France, August 2009.
- Sailing the information ocean with awareness of currents:
discovery and application of source dependence. [PPT]
- NDBC invited talk, Nanchang, China, October 2009.
- SKG tutorial, Zhuhai, China, October 2009.
- AT&T, Florham Park NJ, July 2009.
- Data integration with uncertainty. [PPT]
- SKG panal talk, Zhuhai, China, October 2009.
- Database group, Renmin University, Beijing, China,
September, 2009.
- Computer Science Dept, New Jersey's Science &
Technology University, Newark, NJ, Feb 2009.
- Database Group, UPenn, Philadelphia, PA, Nov
2008.
- Computer Science Dept, Stevens Institute of
Technology, Hoboken NJ, Sep 2008.
- AT&T, Florham Park NJ, May 2008.
- Managing a space of heterogeneous data. [PPT]
- Database group, Renmin University, Beijing, China,
June 2007.
- University of Wisconsin, Madison WI, February
2007.
- Semex: A platform for personal information management and
integration. [PPT]
- Microsoft Research, Adaptive System & Interaction
Core Group, Seattle, November 2005.
- Microsoft Research at Asia, Beijing, China, June
2005.
Patents
- Detecting Dependence Between Sources in Truth Discovery. Xin
Dong, Laure Berti-Equille, Divesh Srivastava. United States
Patent, filed 5/2009, to be issued.
- Minimal difference query and view matching. Raghav Kaushik,
Venkatesh Ganti and Xin Dong. United States Patent 7251646,
issued 7/31/2007.
- Method and apparatus for updating XML views of relational
data. Philip L. Bohannon, Xin
Dong, Henry F. Korth, Suryanarayan Perinkulam. United States
Patent 20050165866, filed Jan 28, 2004, to be issued.
Recent Professional Activities
- PC chair in SKG'09.
- PC member in VLDB Demo'10, WWW'10, NTII'10, ICDE'10,
CIKM'09, WebDB'09, VLDB'09, CIKM'08, WebDB'08, VLDB Demo'08,
WWW'08.
-
NIH contract reviewer, 2008.
- Referee for PVLDB, VLDB Journal, TODS, TCS, TOIT, TOIS,
TKDE, IS.
Resources
-
Here is a long and growing list of papers in database, IR
and AI that I have collected during my research and my readings.
-
Here is a collection of wisdoms on career, research, life,
etc.
|