Review Article| Volume 28, ISSUE 1, P73-82, March 2008

Multi-Database Mining

      Biomedical data useful for data mining are often distributed across multiple databases. These databases may be aggregated using several techniques to create single data sets that may be mined using standard approaches; however, separate databases may, in their design or data representation, capture information that is analytically useful and that is lost on integration. Recent techniques for mining multiple databases simultaneously but separately may preserve and leverage the unique perspectives within each database. This article presents an example, “dual mining,” in which concurrent analysis of a target database with a related knowledge base can improve the identification of association patterns in the target most likely to be of interest for further analysis.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.


      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Lussier Y.A.
        • Liu Y.
        Computational approaches to phenotyping: high-throughput phenomics.
        Proc Am Thorac Soc. 2007; 4: 18-25
        • Sax U.
        • Schmidt S.
        Integration of genomic data in electronic health records–opportunities and dilemmas.
        Methods Inf Med. 2005; 44: 546-550
        • Giardine B.
        • Riemer C.
        • Hefferon T.
        • et al.
        Phencode: connecting encode data with mutations and phenotype.
        Hum Mutat. 2007; 28: 554-562
        • Baumgartner C.
        • Matyas G.
        • Steinmann B.
        • et al.
        A bioinformatics framework for genotype-phenotype correlation in humans with Marfan syndrome caused by FBN1 gene mutations.
        J Biomed Inform. 2006; 39: 171-183
        • Limviphuvadh V.
        • Tanaka S.
        • Goto S.
        • et al.
        The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs).
        Bioinformatics. 2007; 23 (Available at:) (Accessed July 1, 2007): 2129-2138
        • Wald L.
        Definitions and terms of reference in data fusion.
        International Archives of Photogrammetry and Remote Sensing. 1999; 32: 651-654
      1. Carvalho HS, Heinzelman WB, Murphy AL, et al. A general data fusion architecture. Proceedings of the Sixth International Conference of Information Fusion. Cains, Australia, June 2003;2:1465–72.

        • Raza M.
        • Gondal I.
        • Green D.
        • et al.
        Fusion of FNA-cytology and gene-expression data using Dempster-Shafer theory of evidence to predict breast cancer tumors.
        Bioinformation. 2006; 1: 170-175
      2. Kerschberg L. Knowledge management in heterogeneous data warehouse environments. In: Kambayashi Y, et al, editors. Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, September 5–7, 2001, Munich, Germany. Lecture Notes in Computer Science 2001;2114:1–10.

        • Muilu J.
        • Peltonen L.
        • Litton J.
        The federated database: a basis for biobank-based post-genome studies, integrating phenome and genome data from 600,000 twin pairs in Europe.
        Eur J Hum Genet. 2007; 15: 718-723
        • Zhang S.
        • Wu X.
        • Zhang C.
        Multi-database mining.
        IEEE Computational Intelligence Bulletin. 2003; 2: 5-13
        • Liu H.
        • Lu H.
        • Yao J.
        Toward multidatabase mining: identifying relevant databases.
        IEEE Transactions on Knowledge and Data Engineering. 2001; 13: 541-553
        • Zhang S.
        • Zhang C.
        • Wu X.
        Knowledge discovery in multiple databases.
        Springer, New York2004
        • Ye N.
        The handbook of data mining.
        Lawrence Erlbaum Associates, Mahweh (NJ)2003
        • Mitra S.
        • Pal S.K.
        • Mitra P.
        Data mining in soft computing framework: a survey.
        IEEE Transactions on Neural Networks. 2002; 13: 3-14
        • Silberschatz A.
        • Tuzhilin A.
        What makes patterns interesting in knowledge discovery systems.
        IEEE Transactions on Knowledge and Data Engineering. 1996; 8: 970-974
        • Siadaty M.S.
        • Knaus W.A.
        Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method.
        BMC Med Inform Decis Mak. 2006; 6: 13
      3. To Lease MEDLINE/PubMed and other NLM Databases. Available at: Accessed July 1, 2007.

      4. Unified medical language system (UMLS). Available at: Accessed July 1, 2007.

        • Siadaty M.S.
        • Shu J.
        • Knaus W.A.
        ReleMed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles.
        BMC Med Inform Decis Mak. 2007; 7: 1