
Professional Life:
Xin (Luna) Dong
lunadong@research.att.com
Data Management Dept
AT&T Labs--Research
Bld 103, Rm B281
180 Park Ave.
Florham Park, NJ 07932
Tel: (973)360-8508
Fax: (973)360-8421
|
Xin (Luna) Dong
I am currently a researcher in the Data Management Department at
AT&T Labs - Research. I received my Ph.D. in
Computer Science and Engineering at
Univ. of Washington. Before coming to the United States, I
obtained a M.S. in Computer Science at
Peking University, and a B.S. in Computer Science at
Nankai University in China.
You can find my C.V.
here.
Research Interest
The goal of my research is to help people organize, access, and
search information effectively and efficiently. My research aims at
answering the following questions:
-
The amount of information produced in the world increases by 30%
every year and this rate will only go up. In the big-data
environment, where we often lack resources to manage data rather
than lacking data, how can we balance the quality of integrated
data and the resources we need to spend on integrating data?
-
The Web has been changing our lives enormously and people rely
more and more on the Web to fulfill their information needs.
Compared with traditional media, information on the Web can be
published fast, but with fewer guarantees on quality and
credibility. How can we leverage the collective wisdom on the
Web to improve the quality of the data integrated from different
Web sources?
-
Nowadays more and more data-management applications need to
integrate a large volume of heterogeneous data, but data
integration often requires significant upfront efforts and
technical expertise. How can we enable data sharing with a
minimum cost and how can we still guarantee quality of the
shared data?
-
Data exist in various forms: structured, semi-structure, and
unstructured. How to realize the full potential of structure
explicitly or implicitly existing in data to best fulfill
people's information needs?
-
How to help non-technical-savvies to best organize, access, and
understand information?
In particular, my current research interests include
-
Data integration
-
Data cleaning
-
Web search
-
Personal information management, community information
management, enterprise data management.
Projects
Currently my research focuses on two directions: source selection
for balancing integration quality and integration cost, and data
fusion via copy detection.
Alexander--Source
exploration and selection
In the big data environment, we often lack resources to
manage data rather than lacking data themselves. Alexander aims
at helping administrators explore available data sources and
select the sources to balance the quality of integration and the
cost of integration.
Solomon---Detecting
dependence between data sources
The Internet accelerates the rate of information being
produced and eases duplication and transmission of data across
data sources. Solomon aims at detecting copying between data
sources and leveraging such knowledge for deciding truth from
conflicting information and for efficiently answering queries
over a set of data sources. [Vision
paper][Talk][Demo]
My recent projects include
-
UDI---Data Integration with uncertainty [Talk][DBClip]
-
Semex---Personal information management system
-
Woogle---Web-service search engine
-
Piazza---Peer data management system
Selected Publications
You can find the full list of my publications
here and my DBLP entry
here. Below is a list of selected papers categorized by research
area.
-
Data fusion and copying detection
-
Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh
Srivastava: Online data fusion. In VLDB, 2011. [PDF][Presentation]
-
Xin Luna Dong and Divesh Srivastava. Large-Scale Copying
Detection. Tutorial in Sigmod, 2011. [PDF][Presentation]
-
Anish Das Sarma, Xin Luna Dong, Alon Halevy. Data
integration with dependent sources. In EDBT,
2011. [PDF][Presentation]
-
Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh
Srivastava. Solomon: Seeking the truth via copying
detection. Demo in VLDB, 2010. [PDF][Poster][Demo]
-
Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh
Srivastava. Global detection of complex copying
relationships between sources. In VLDB, 2010. [PDF][Presentation]
-
Xin Luna Dong and Felix Naumann. Data fusion--Resolving
data conflicts for integration. Totorial in VLDB,
2009. [PDF][Presentation]
-
Xin Luna Dong, Laure Berti-Equille, and Divesh
Srivastava. Truth discovery and copying detection in a
dynamic world. In VLDB, 2009. [PDF][Presentation]
-
Xin Luna Dong, Laure Berti-Equille, and Divesh
Srivastava. Integrating conflicting data: the role of
source dependence. In VLDB, 2009. [PDF][Presentation]
-
Laure Berti-Equille, Anish Das Sarma, Xin Luna Dong,
Amelie Marian, and Divesh Srivastava. Sailing the
information ocean with awareness of currents: discovery
and application of source dependence. In CIDR,
2009. [PDF][Presentation]
-
Record linkage
-
Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh
Srivastava: Linking Temporal Records. In VLDB
2011. [PDF][Presentation][JournalVersion]
-
Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi
Zajac. Record Linkage with Uniqueness Constraints and
Erroneous Values. In VLDB, 2010. [PDF][Presentation]
-
Xin Dong, Alon Y. Halevy and Jayant Madhavan: Reference
Reconciliation in Complex Information Spaces. In
SIGMOD 2005. [PDF][Presentation]
-
Dataspaces
-
Daisy Zhe Wang, Xin Luna Dong, Anish Das Sarma, Michael
Franklin, Alon Halevy. Functional Dependency Generation
and Applications in Pay-As-You-Go Data Integration
Systems. In WebDB, 2009. [PDF]
-
Anish Das Sarma, Xin Dong, and Alon Y. Halevy:
Bootstrapping Pay-as-you-go Data Integration Systems. In
SIGMOD, 2008. [PDF]
-
Xin Dong, Alon Y. Halevy and Cong Yu: Data Integration
with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion]
-
Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In
SIGMOD, 2007. [PDF][Presentation]
-
Jing Liu, Xin Dong and Alon Y. Halevy: Answering
Structured Queries on Unstructured Data. In WebDB
2006. [PDF][Presentation]
-
Personal information management
-
Yuhan Cai, Xin Dong, Alon Y. Halevy, Jing Liu and Jayant
Madhavan: Personal Information Management with SEMEX.
SIGMOD Demo 2005. (BEST DEMO, one of three top
demos)[PDF][Presentation]
-
Xin Dong and Alon Y. Halevy: A Platform for Personal
Information Management and Integration. In CIDR
2005. [PDF][Presentation]
-
Misc
-
Su Chen, Xin Luna Dong, Laks V.S. Lakshmanan and Divesh
Srivastava: We Challenge You to Certify Your Update. In
Sigmod 2011. [PDF][Presentation][Poster]
-
Xin Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes and
Jun Zhang: Similarity Search for Web Services. In
VLDB 2004. [PDF][Presentation]
-
Xin Dong, Alon Y. Halevy and Igor Tatarinov: Containment
of Nested XML Queries. In VLDB 2004. [PDF][Presentation][Tech-report]
Recent Talks
-
Truth Finding on the Deep Web. [PPT]
-
Computer Science Dept., HUKST, Hongkong, China, April
2012.
-
Develop Your Big Ideas. [PPT]
-
New-researcher symposium, Sigmod, Athens, Greece,
June 2011.
-
Large-Scale Copy Detection. [PPT]
-
DASFAA tutorial, Busan, Korea, April 2012.
-
ICDE tutorial, Arlington, USA, April 2012.
-
Sigmod tutorial, Athens, Greece, June 2011.
-
Solomon: Seeking the Truth Via Copying Detection. [PPT][Video]
-
Computer Science Dept, Tianjin University of Technology,
Tianjin, China, January, 2012.
-
Computer Science Dept, Univ. of Washington, WA,
August, 2011.
-
Microsoft Research, WA, August, 2011.
-
BEWEB invited talk, Uppsala, Sweden, March, 2011.
-
Computer Science Dept, SUNY Binghamton, NY, March,
2011.
-
QDB keynote talk, Singapore, September 2010. [PPT]
-
AT&T Cookie talk, Florham Park NJ, August, 2010.
-
Data fusion--Resolving data conflicts for integration. [PPT]
-
NDBC tutorial, Nanchang, China, October 2009.
[PPT]
-
Institute of Computing Technology, Chinese Academy of
Sciences, Beijing, China, September 2009.
-
VLDB tutorial, Lyon, France, August 2009.
-
Sailing the information ocean with awareness of currents:
discovery and application of source dependence. [PPT]
-
Database group, UC Irvine, CA, February 2010.
-
NDBC invited talk, Nanchang, China, October 2009.
-
SKG tutorial, Zhuhai, China, October 2009.
-
Microsoft Research at Asia, Beijing, China, September
2009.
-
AT&T, Florham Park NJ, July 2009.
Previous Talks
-
Data integration with uncertainty. [PPT]
-
Managing a space of heterogeneous data. [PPT]
-
Semex: A platform for personal information management and
integration. [PPT]
Patents
-
Method and Apparatus for Exploring and Selecting Data Sources.
Xin Dong and Divesh Srivastava. United States Patent, filed
12/2011, to be issued.
- Online Data Fusion. Xuan Liu, Xin Dong, Ben Chin Ooi and
Divesh Srivastava. United States Patent, filed 12/2011, to be
issued.
- Update Certificates. Su Chen, Xin Dong, Laks Lakshmanan, and
Divesh Srivastava. United States Patent, filed 9/2010, to be
issued.
-
Detecting Dependence Between Sources in Truth Discovery. Xin
Dong, Laure Berti-Equille, Divesh Srivastava. United States
Patent, filed 5/2009, to be issued.
-
Minimal difference query and view matching. Raghav Kaushik,
Venkatesh Ganti and Xin Dong. United States Patent 7251646,
issued 7/31/2007.
-
Method and apparatus for updating XML views of relational data.
Philip L. Bohannon, Xin
Dong, Henry F. Korth, Suryanarayan Perinkulam. United States
Patent 20050165866, filed Jan 28, 2004, to be issued.
Recent Professional Activities
-
Associate editor of
IEEE Data Engineering Bulletin 9/2011.
-
Co-chair of ACM
SIGMOD New Researcher Symposium'12-13,
Sigmod/PODS Ph.D.
Symposium'12,
QDB'12, WebDB'10
[Report],
SKG'09 [Issue].
-
PC area chair in ICDE'13, CIKM'11.
-
PC member in PVLDB'13, Sigmod'12, VLDB'12, EDBT'12 Industry
track, Sigmod'11, PVLDB'11, WAIM'11, AMW'11, PVLDB'10,
ICDE'10, WWW'10, VLDB Demo'10, NTII'10, PVLDB'09, VLDB'09,
CIKM'09, WebDB'09, WWW'08, CIKM'08, VLDB Demo'08, WebDB'08.
-
Referee for VLDB Journal, TODS, TCS, TOIT, TOIS, TKDE, IS.
-
NSF panelist, 2011.
-
NIH contract reviewer, 2008.
Resources
-
Here is a long and growing list of papers in database, IR
and AI that I have collected during my research and my readings.
-
Here is a collection of wisdoms on career, research, life,
etc.
|