AT&T Labs - Research
AT&T  


Professional Life:

Xin (Luna) Dong
lunadong@research.att.com
Data Management Dept
AT&T Labs--Research
Bld 103, Rm B281
180 Park Ave.
Florham Park, NJ 07932
Tel: (973)360-8508
Fax: (973)360-8421

 

 

Xin (Luna) Dong

I am currently a researcher in the Data Management Department at AT&T Labs - Research. I received my Ph.D. in Computer Science and Engineering at Univ. of Washington. Before coming to the United States, I obtained a M.S. in Computer Science at Peking University, and a B.S. in Computer Science at Nankai University in China.

You can find my C.V. here.
 


Research Interest

The goal of my research is to help people organize, access, and search information effectively and efficiently. My research aims at answering the following questions:

  • The amount of information produced in the world increases by 30% every year and this rate will only go up. In the big-data environment, where we often lack resources to manage data rather than lacking data, how can we balance the quality of integrated data and the resources we need to spend on integrating data?
  • The Web has been changing our lives enormously and people rely more and more on the Web to fulfill their information needs. Compared with traditional media, information on the Web can be published fast, but with fewer guarantees on quality and credibility. How can we leverage the collective wisdom on the Web to improve the quality of the data integrated from different Web sources?
  • Nowadays more and more data-management applications need to integrate a large volume of heterogeneous data, but data integration often requires significant upfront efforts and technical expertise. How can we enable data sharing with a minimum cost and how can we still guarantee quality of the shared data?
  • Data exist in various forms: structured, semi-structure, and unstructured. How to realize the full potential of structure explicitly or implicitly existing in data to best fulfill people's information needs?
  • How to help non-technical-savvies to best organize, access, and understand information?

In particular, my current research interests include

  • Data integration
  • Data cleaning
  • Web search
  • Personal information management, community information management, enterprise data management.

Projects

Currently my research focuses on two directions: source selection for balancing integration quality and integration cost, and data fusion via copy detection.
 

Alexander--Source exploration and selection

In the big data environment, we often lack resources to manage data rather than lacking data themselves. Alexander aims at helping administrators explore available data sources and select the sources to balance the quality of integration and the cost of integration.

 

Solomon---Detecting dependence between data sources

The Internet accelerates the rate of information being produced and eases duplication and transmission of data across data sources. Solomon aims at detecting copying between data sources and leveraging such knowledge for deciding truth from conflicting information and for efficiently answering queries over a set of data sources.  [Vision paper][Talk][Demo]

 

My recent projects include

  • UDI---Data Integration with uncertainty [Talk][DBClip]
  • Semex---Personal information management system
  • Woogle---Web-service search engine
  • Piazza---Peer data management system

Selected Publications

You can find the full list of my publications here and my DBLP entry here. Below is a list of selected papers categorized by research area.

  • Data fusion and copying detection
    • Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava: Online data fusion. In VLDB, 2011. [PDF][Presentation]
    • Xin Luna Dong and Divesh Srivastava. Large-Scale Copying Detection. Tutorial in Sigmod, 2011. [PDF][Presentation]
    • Anish Das Sarma, Xin Luna Dong, Alon Halevy. Data integration with dependent sources. In EDBT, 2011. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava. Solomon: Seeking the truth via copying detection. Demo in VLDB, 2010. [PDF][Poster][Demo]
    • Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava. Global detection of complex copying relationships between sources. In VLDB, 2010. [PDF][Presentation]
    • Xin Luna Dong and Felix Naumann. Data fusion--Resolving data conflicts for integration. Totorial in VLDB, 2009. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. [PDF][Presentation]
    • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: the role of source dependence. In VLDB, 2009. [PDF][Presentation]
    • Laure Berti-Equille, Anish Das Sarma, Xin Luna Dong, Amelie Marian, and Divesh Srivastava. Sailing the information ocean with awareness of currents: discovery and application of source dependence. In CIDR, 2009. [PDF][Presentation]
  • Record linkage
    • Pei Li, Xin Luna Dong, Andrea Maurino, and Divesh Srivastava: Linking Temporal Records. In VLDB 2011. [PDF][Presentation][JournalVersion]
    • Songtao Guo, Xin Luna Dong, Divesh Srivastava, and Remi Zajac. Record Linkage with Uniqueness Constraints and Erroneous Values. In VLDB, 2010. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Jayant Madhavan: Reference Reconciliation in Complex Information Spaces. In SIGMOD 2005. [PDF][Presentation]
  • Dataspaces
    • Daisy Zhe Wang, Xin Luna Dong, Anish Das Sarma, Michael Franklin, Alon Halevy. Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems. In WebDB, 2009. [PDF]
    • Anish Das Sarma, Xin Dong, and Alon Y. Halevy: Bootstrapping Pay-as-you-go Data Integration Systems. In SIGMOD, 2008. [PDF]
    • Xin Dong, Alon Y. Halevy and Cong Yu: Data Integration with Uncertainties. In VLDB, 2007. [PDF][Presentation][DBClip][JournalVersion]
    • Xin Dong and Alon Y. Halevy: Indexing Dataspaces. In SIGMOD, 2007. [PDF][Presentation]
    • Jing Liu, Xin Dong and Alon Y. Halevy: Answering Structured Queries on Unstructured Data. In WebDB 2006. [PDF][Presentation]
  • Personal information management
    • Yuhan Cai, Xin Dong, Alon Y. Halevy, Jing Liu and Jayant Madhavan: Personal Information Management with SEMEX. SIGMOD Demo 2005. (BEST DEMO, one of three top demos)[PDF][Presentation]
    • Xin Dong and Alon Y. Halevy: A Platform for Personal Information Management and Integration. In CIDR 2005. [PDF][Presentation]
  • Misc
    • Su Chen, Xin Luna Dong, Laks V.S. Lakshmanan and Divesh Srivastava: We Challenge You to Certify Your Update. In Sigmod 2011. [PDF][Presentation][Poster]
    • Xin Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes and Jun Zhang: Similarity Search for Web Services. In VLDB 2004. [PDF][Presentation]
    • Xin Dong, Alon Y. Halevy and Igor Tatarinov: Containment of Nested XML Queries. In VLDB 2004. [PDF][Presentation][Tech-report]

Recent Talks

  • Truth Finding on the Deep Web. [PPT]
    • Computer Science Dept., HUKST, Hongkong, China, April 2012.
  • Develop Your Big Ideas. [PPT]
    • New-researcher symposium, Sigmod, Athens, Greece, June 2011.
  • Large-Scale Copy Detection. [PPT]
    • DASFAA tutorial, Busan, Korea, April 2012.
    • ICDE tutorial, Arlington, USA, April 2012.
    • Sigmod tutorial, Athens, Greece, June 2011.
  • Solomon: Seeking the Truth Via Copying Detection. [PPT][Video]
    • Computer Science Dept, Tianjin University of Technology, Tianjin, China, January, 2012.
    • Computer Science Dept, Univ. of Washington, WA, August, 2011.
    • Microsoft Research, WA, August, 2011.
    • BEWEB invited talk, Uppsala, Sweden, March, 2011.
    • Computer Science Dept, SUNY Binghamton, NY, March, 2011.
    • QDB keynote talk, Singapore, September 2010. [PPT]
    • AT&T Cookie talk, Florham Park NJ, August, 2010.
  • Data fusion--Resolving data conflicts for integration. [PPT]
    • NDBC tutorial, Nanchang, China, October 2009. [PPT]
    • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, September 2009.
    • VLDB tutorial, Lyon, France, August 2009.
  • Sailing the information ocean with awareness of currents: discovery and application of source dependence. [PPT]
    • Database group, UC Irvine, CA, February 2010.
    • NDBC invited talk, Nanchang, China, October 2009.
    • SKG tutorial, Zhuhai, China, October 2009.
    • Microsoft Research at Asia, Beijing, China, September 2009.
    • AT&T, Florham Park NJ, July 2009.

Previous Talks

  • Data integration with uncertainty. [PPT]
  • Managing a space of heterogeneous data. [PPT]
  • Semex: A platform for personal information management and integration. [PPT]

Patents

  • Method and Apparatus for Exploring and Selecting Data Sources. Xin Dong and Divesh Srivastava. United States Patent, filed 12/2011, to be issued.
  • Online Data Fusion. Xuan Liu, Xin Dong, Ben Chin Ooi and Divesh Srivastava. United States Patent, filed 12/2011, to be issued.
  • Update Certificates. Su Chen, Xin Dong, Laks Lakshmanan, and Divesh Srivastava. United States Patent, filed 9/2010, to be issued.
  • Detecting Dependence Between Sources in Truth Discovery. Xin Dong, Laure Berti-Equille, Divesh Srivastava. United States Patent, filed 5/2009, to be issued.
  • Minimal difference query and view matching. Raghav Kaushik, Venkatesh Ganti and Xin Dong. United States Patent 7251646, issued 7/31/2007.
  • Method and apparatus for updating XML views of relational data. Philip L. Bohannon, Xin
    Dong, Henry F. Korth, Suryanarayan Perinkulam. United States Patent 20050165866, filed Jan 28, 2004, to be issued.

Recent Professional Activities

  • Associate editor of IEEE Data Engineering Bulletin 9/2011.
  • Co-chair of ACM SIGMOD New Researcher Symposium'12-13, Sigmod/PODS Ph.D. Symposium'12, QDB'12, WebDB'10 [Report], SKG'09 [Issue].
  • PC area chair in ICDE'13, CIKM'11.
  • PC member in PVLDB'13, Sigmod'12, VLDB'12, EDBT'12 Industry track, Sigmod'11, PVLDB'11, WAIM'11, AMW'11, PVLDB'10, ICDE'10, WWW'10, VLDB Demo'10, NTII'10, PVLDB'09, VLDB'09, CIKM'09, WebDB'09, WWW'08, CIKM'08, VLDB Demo'08, WebDB'08.
  • Referee for VLDB Journal, TODS, TCS, TOIT, TOIS, TKDE, IS.
  • NSF panelist, 2011.
  • NIH contract reviewer, 2008.

Resources

  • Here is a long and growing list of papers in database, IR and AI that I have collected during my research and my readings.
  • Here is a collection of wisdoms on career, research, life, etc.
 

Personal Life:

Xin (Luna) Dong 董欣
lunadong@gmail.com
Morristown, NJ 07960
Tel: (201)650-3494


In my personal life, I am

I would love to seek the harmony of science and art, and to find a good balance between research and life.

                             

Last update: 4/2012