|
Data
Integration
- Michael Franklin, Alon Y. Halevy, and David
Maier. From databases to dataspaces: A new abstraction for
information management. Sigmod Record, Dec. 2005.
- Alon Y. Halevy, Naveen Ashishi, Dina Bitton,
Michael Carey, Denise Draper, Jeff Pollock, Arnon Rosenthal, and
Vishal Sikka. Enterprise information integration: successes,
challenges and controversies. Sigmod 2005.
Information Integration Using
Logical Views [Link]
Data Exchange
- Ronald Fagin and Phokion G. Kolaitis and Renee J.
Miller and Lucian Popa. Data Exchange: Semantics and Query
Answering. ICDT, 2003. (First paper on data exchange)
- Ronald Fagin and Phokion G. Kolaitis and Lucian
Popa. Data Exchange: Getting to the Core. ACM Transactions on
Database Systems, 30(1):174-201, 2005. (Must-read)
- Ariel Fuxman and Phokion G. Kolaitis and Renee J.
Miller and Wang Chiew Tan. Peer data exchange. PODS, 2005.
- Phokion G. Kolaitis and Jonathan Panttaja and
Wang Chiew Tan. The complexity of data exchange. PODS, 2006.
- Georg Gottlob and Alan Nash. Data Exchange:
Computing Cores in Polynomial Time. PODS, 2006.
- Leonid Libkin. Data exchange and incomplete
information. PODS, 2006.
Up
to top
Schema Matching
- Survey
- E. Rahm and P. A. Bernstein. A survey of
approaches to automatic schema matching. VLDB Journal,
10(4):334-350, 2001. (Must-read)
- Pavel Shvaiko. A
classification of schema-based matching approaches. Unpublished.
- Element-level Matching
- Schema name & description
- P. Mitra, G. Wiederhold, and J Jannink.
Semi-automatic integration of knowledge sources. Proc. of
Fusion, 1999.
- L. Palopoli, D. Sacca, and D. Ursino.
Semi-automatic, semantic discover of properties from database
schemas. IDEAS, 244-253, 1998.
- C. Clifton, E. Housman, and A. Rosenthal.
Experience with a combined approach to attribute-matching across
heterogenenous databases. Proc. 7, IFIP 2.6 Working Conf.
Database Semantics, 1997.
- D. W. Embley. Multifaceted exploitation of
metadata for attribute match discovery in information
integration. In WIIW, 2001.
- Instance
- A. Doan, J. Madhavan, P. Domingos, and A.
Halevy. Learning to map between ontologies on the
semantic web. In Proc. of the Int. WWW Conf., 2002.
- Constraint
- P. Mitra, G. Wiederhold, and M. Kersten. A
graph-oriented model for articulation of ontology
interdependencies. In Pro. of Extending DataBase Technologies,
2000.
- S. Bergamaschi, S. Castano, M. Vincini, and D.
Beneventano. Semantic integration of heterogeneous
information sources. Data & Knowledge Engineering, 36(3),
2001.
- J. Kang and J. Naughton. On schema matching
with opaque column names and data values. In Proc. of SIGMOD,
2003.
- Structure-level Matching
- T. Milo and S. Zohar. Using schema matching to
simplify heterogeneous data translation. In Proceedings
of the International Conference on Very Large Databases (VLDB),
1998.
- L. Palopoli, D. Sacca, D. Ursino. An automatic
technique for detecting type conflicts in database schemas. CIKM,
306-313, 1998.
- J. Madhavan, P. Bernstein, and E. Rahm.
Generic schema matching with Cupid. In Proceedings of the
International Conference on Very Large Databases (VLDB), 2001.
- S. Melnik, H. Garcia-Molina, and E. Rahm.
Similarity Flooding: A Versatile Graph Matching Algorithm.
In Proc. of ICDE, 2002.
- Lerner BS. A model for compound type changes
encountered in schema evolution. ACM TODS 25(1):83-127, 2000.
- K. Zhang and D Shasha. Approximate tree
pattern matching. Pattern matching in strings, trees, and
arrays, 341-371, 1997.
- D. Calvanese, S. Castano, F. Guerra, D. Lembo, M.
Melchiorri, G. Terracina, D. Ursino, and M. Vincini.
Towards a Comprehensive Framework for Semantic Integration of
Highly Heterogeneous Data Sources.
In Proc. of the 8th Int. Workshop on Knowledge Representation meets
Databases (KRDB2001), 2001.
- L. Xu and D. Embley. Discovering Direct and
Indirect Matches for Schema Elements. In DASFAA,
2003.
- Combining Matchers
- A. Doan, P. Domingos, and A. Halevy.
Reconciling schemas of disparate data sources: a machine
learning approach. In Proc. of SIGMOD, 2001.
- H.-H. Do and E. Rahm. COMA - A System for
Flexible Combination of Schema Matching Approaches.
In Proc. of VLDB, 2002.
- Cluster-based Matching
- W. Wu, C. Yu, A. Doan, and W. Meng. An
interactive clustering-based approach to integrating source
query interfaces on the deep web. In Proc. of SIGMOD, 2004.
- B. He and K. C.-C. Chang. Statistical schema
integration across the deep web. In Proc. of SIGMOD,
2003.
- W. Li and C. Clifton. SemInt: a tool for
identifying attribute correspondences in heterogeneous
databases using neural network. Data Knowledge Engineering,
33(1), 2000.
- S. Castano, V. De Antonellis, and S. De Capitani
di Vemercati. Global viewing of heterogeneous data sources.
IEEE Trans Data Knowl Eng 13(2):277-297, 2001.
- Learn from Previous Matching
- J. Madhavan, P. Bernstein, A. Doan, and A.
Halevy. Corpus-basd schema matching. In Proc. of ICDE,
2005.
- Jayant Madhavan, Philip A.
Bernstein, Kuang Chen, Alon Halevy, and Pradeep Shenoy.
Corpus-based Schema Matching. In Workshop on Information
Integration on the Web at IJCAI, 2003.
- Query Discovery
- R. J. Miller, L. M. Haas, and M. A. Hernandez.
Schema mapping as query discovery. In VLDB, 2000
Up
to top
Meta Data Management
-
Meta Data Applications
-
Lucian Popa, Yannis Velegrakis, Renee
J. Miller, Mauricio A. Hernández, Ronald Fagin: Translating Web Data, VLDB 2002.
-
Stefano Spaccapietra, Christine
Parent: View Integration: A Step Forward in Solving Structural Conflicts.
TKDE 6(2): 258-274 (1994)
-
Data Models
-
Natalya F. Noy,
Mark A. Musen, Jose L.V. Mejino, and Cornelius Rosse: Pushing the Envelope: Challenges in a Frame-Based Representation of
Human Anatomy. SMI Report Number: SMI-2002-0925,
http://smi-web.stanford.edu/pubs/SMI_Abstracts/SMI-2002-0925.html.
-
Richard Hull:
Relative Information Capacity of Simple Relational Database Schemata.
PODS 1984: 97-109
-
Mechanisms
-
Paolo Atzeni, Riccardo Torlone: Management of Multiple Models in an Extensible Database Design Tool.
EDBT 1996: 79-95.
-
Philip A. Bernstein: Applying Model Management to Classical Meta Data Problems
submitted for publication
-
Peter Buneman, Susan B. Davidson,
Anthony Kosky: Theoretical Aspects of Schema Merging. EDBT, 152-167, 1992.
Up to top
Object Matching (a.k.a Record
Linkage)
- History and overview:
-
Origination:
H. Newcombe, J. Kennedy, S. Axford, and A. James. Automatic
linkage of vital records. In Science 130 (1959), no. 3381,
pages 954-959, 1959.
- First formalization:
Ivan Felligi and Alan Sunter.
A theory for record linkage. Journal of the American Statistical
Society, 64:1183--1210, 1969.
- Survey:
- William Winkler.
Overview of record linkage and current research directions.
Technical Report, Statistical Research Division, U.S. Bureau of
the Census, 2006.
- Lifang Gu, Rohan
Baxter, Deanne Vickers, and Chris Rainsford.
Record Linkage: Current Practice and Future Directions.
Unpublished, 2004.
-
M. Bilenko, R. Mooney, W. Cohen,
P. Ravikumar, and S. Fienberg. Adaptive
name matching in information integration. IEEE Intelligent Systems
Special Issue on Information Integration on the Web, September
2003. (Must-read)
-
Mohamed G. Elfeky, Vassilios S.
Verykios and Ahmed K. Elmagarmid.
TAILOR: A record linkage Toolbox.
- Field-wise Matching (String comparison)
- Survey:
William Cohen, Pradeep Ravikumar and Stephen Fienberg.
A Comparison of String Distance Metrics for Name-Matching Tasks.
In Workshop on Information Integration on the Web (IIW), at IJCAI
2003. (Must-read)
- William Winkler and Edward Porter.
Approximate String Comparison and its effect on an Advanced Record
Linkage System. Technical report, Statistical Research Division,
U.S. Bureau of the Census, 1997.
- Adaptive string matching
- Mikhail Bilenko and Raymond Mooney.
Adaptive Duplicate Detection Using Learnable String Similarity
Measures. In Proceedings of the 9th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD-2003),
pp.39-48, Washington, DC, August 2003.
-
S. Tejada, C. Knoblock, and S.
Minton. Learning domain-independent
string transformation weights for high accuracy object
identification. In SIGKDD, 2002.
- Record-wise Matching
- Rule-based:
-
H. Galhardas, D. Florescu, D.
Shasha, E. Simon, and C.-A. Saita.
Declarative data cleaning: language, model, and algorithms. In
VLDB, pages 371-380, 2001.
-
L. Jin, C. Li, and S. Mehrotra.
Efficient Record Linkage in Large Data Sets. In DASFAA, 2003.
-
M. L. Lee, T. W. Ling, and W. L.
Low. Intelliclean: a knowledge-based
intelligent data cleaner. In SIGKDD, pages 290-294, 2000.
- EM Method:
- William Winkler.
Using the EM Algorithm for Weight Computation in the
Felligi-Sunter Model of Record Linkage. Technical Report
RR2000/05, Statistical Research Division, Bureau of Census, 2000.
- William Winkler:
Advanced methods for record linkage. Technical Report, 1994.
- Learning:
- Jose C. Pinheiro and Don X. Sun.
Methods for linking and mining massive heterogeneous databases.
AAAI, 1998.
-
W. Cohen and J. Richman.
Learning to match and cluster large high-dimensional data sets for
data integration, 2002.
-
Sunita
Sarawagi and Anuradha Bhamidipaty.
Interactive Deduplication using Active Learning. In
Proceedings of the ACM SIGKDD, 2002.
-
Decision Tree:
S. Tejada, C. Knoblock, and S. Minton: Learning
domain-independent string transformation weights for high accuracy
object identiØcation. In SIGKDD, 2002.
-
Bayes and SVM:
S. Sarawagi and A. Bhamidipaty:
Interactive deduplication using active learning. In SIGKDD, 2002.
-
Secondary Knowledge
-
A. Doan, Y. Lu, Y. Lee, and J.
Han. Object matching for information
integration: a proØler-based approach. In IIWeb, 2003.
-
X. Dong and A. Halevy.
A Platform for Personal Information Management and Integration. In
Proc. of CIDR, 2005.
-
M. Michalowski, S. Thakkar, and
C. A. Knoblock. Exploiting secondary
sources for unsupervised record linkage. In IIWeb, 2004.
- Collective Model
- William Cohen, David McAllester, and Henry
Kautz.
Hardening Soft Information Sources. In Proceedings of ACM SIGKDD,
2000, 255-259.
- Hanna Pasula, Bhaskara Marthi, Brian Milch,
Stuart Russell, and Ilya Shpitser.
Identity Uncertainty and Citation Matching. In Proceedings of
the International Conference on Advances in Neural Information
Processing Systems (NIPS) 15, 2003.
- Andrew McCallum and Ben Wellner.
Toward conditional models of identity uncertainty with application
to proper noun coreference. IJCAI 2003.
-
Parag and P.
Domingos. Multi-relational record linkage.
In MRDM, 2004.
-
R. Ananthakrishna, S. Chaudhuri,
and V. Ganti. Eliminating Fuzzy Duplicates
in Data Warehouses. In Proc. of VLDB, 2002.
-
I. Bhattacharya and L. Getoor.
Iterative record linkage for cleaning and integration. In DMKD,
2004.
-
D. V. Kalashnikov, S. Mehrotra,
and Z. Chen. Exploiting relationships for
domain-independent data cleaning. In SIAM Data Mining (SDM), 2005.
-
Xin Dong,
Alon Halevy, Jayant Madhavan. Reference
reconciliation in complex data spaces. In Sigmod, 2005.
- Efficiency and Scalability
- Mauricio Hernandez and Salvatore Stolfo.
The Merge/Purge Problem for Large Databases. In Proceedings of
the ACM SIGMOD Conference, 1995.
- Andrew McCallum, Kamal Nigam and Lyle Ungar.
Efficient Clustering of High-Dimensional Data Sets with Application
to Reference Matching. In Proceedings of the ACM SIGKDD, 2000. (Must
read -- classical paper for canopy)
- Surajit Chaudhuri, Kris
Ganjam, Venkatesh Ganti, and Rajeev Motwani.
Robust and Efficient Fuzzy Match for Online Data cleaning. In
Proceedings of the ACM SIGMOD, 2003.
- Rohit Ananthakrishna,
Surajit Chaudhuri, and Venkatesh Ganti.
Eliminating Fuzzy Duplicates in Data Warehouses. VLDB 2002.
Up
to top
Data
Fusion
- J. Bleiholder and F. Naumann.
Conflicting handling strategies in an
integrated information system. In WWW'06.
- M. Wu and A. Marian.
Corroborating answers from multiple web
sources. In WebDB'07.
- X. Yin, J. Han, and P. S. Yu.
Truth discovery with multiple conflicting
information providers on the web. In
SIGKDD'07.
Up
to top
Data
Integration with Uncertainty
- Probabilistic Schema Matching
- Xin Dong, Alon Halevy, and Cong
Yu. Data integration with
uncertainty. VLDB'07.
- Carmel Domshlak, Avigdor Gal, and
Haggai Roitman. Rank aggregation for
automatic schema matching. TKDE 19(4),
2007
- Avigdor Gal. Why is schema
matching tough and what can we do about
it. Sigmod Record, 35(4), 2006
- Avigdor Gal, Ateret Anaby-Tavor,
Alberto Trombetta, Danilo Montesi. A
framework for modeling and evaluating
automatic semantic reconciliation.
VLDB Journal, 2003.
- Henrik Nottelmann and Umberto
Straccia. A probabilistic,
logic-based framework for automated web
directory alignment.
- Henrik Nottelmann and Umberto
Straccia. Information retrieval and
machine learning for probabilistic
schema matching. Information
Processing and Management 43:552-576,
2007.
- Generating Probabilistic Mediated
Schemas
- Anish Das Sarma, Xin Dong, and
Alon Halevy. Bootstrapping
pay-as-you-go data integration systems.
Sigmod'08.
- M. Magnani, N. Rizopoulos, P.
Brien, and D. Montesi. Schema
integration based on uncertain semantic
mappings. Lecture Notes in Compute
Science, 2007.
Up
to top |