Advertisement
Review Article| Volume 28, ISSUE 1, P55-71, March 2008

The Development of Health Care Data Warehouses to Support Data Mining

  • Jason A. Lyman
    Correspondence
    Corresponding author. Division of Clinical Informatics, Department of Public Health Sciences, Suite 3181 West Complex, 1335 Lee Street, University of Virginia Health System, Charlottesville, VA 22908.
    Affiliations
    Division of Clinical Informatics, Department of Public Health Sciences, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA

    Clinical Data Repository, University of Virginia School of Medicine, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA
    Search for articles by this author
  • Kenneth Scully
    Affiliations
    Division of Clinical Informatics, Department of Public Health Sciences, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA

    Clinical Data Repository, University of Virginia School of Medicine, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA
    Search for articles by this author
  • James H. Harrison Jr.
    Affiliations
    Division of Clinical Informatics, Department of Public Health Sciences, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA

    Department of Pathology, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908, USA
    Search for articles by this author
      Clinical data warehouses offer tremendous benefits as a foundation for data mining. By serving as a source for comprehensive clinical and demographic information on large patient populations, they streamline knowledge discovery efforts by providing standard and efficient mechanisms to replace time-consuming and expensive original data collection, organization, and processing. Building effective data warehouses requires knowledge of and attention to key issues in database design, data acquisition and processing, and data access and security. In this article, the authors provide an operational and technical definition of data warehouses, present examples of data mining projects enabled by existing data warehouses, and describe key issues and challenges related to warehouse development and implementation.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.

      Subscribe:

      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Dewitt J.G.
        • Hampton P.M.
        Development of a data warehouse at an academic health system: knowing a place for the first time.
        Acad Med. 2005; 80: 1019-1025
        • Kamal J.
        • Pasuparthi K.
        • Rogers P.
        • et al.
        Using an information warehouse to screen patients for clinical trials: a prototype.
        Proc AMIA Symp. 2005; : 1004
        • Bock B.J.
        • Dolan C.T.
        • Miller G.C.
        • et al.
        The data warehouse as a foundation for population-based reference intervals.
        Am J Clin Pathol. 2003; 120: 662-670
        • Einbinder J.S.
        • Scully K.W.
        • Pates R.D.
        • et al.
        Case study: a data warehouse for an academic medical center.
        J Healthc Inf Manag. 2001; 15: 165-175
        • Tusch G.
        • Muller M.
        • Rohwer-Mensching K.
        • et al.
        Data warehouse and data mining in a surgical clinic.
        Stud Health Technol Inform. 2000; 77: 784-789
        • Wisniewski M.F.
        • Kieszkowski P.
        • Zagorski B.M.
        • et al.
        Development of a clinical data warehouse for hospital infection control.
        J Am Med Inform Assoc. 2003; 10: 454-462
        • Murphy S.N.
        • Morgan M.M.
        • Barnett G.O.
        • et al.
        Optimizing healthcare research data warehouse design through past COSTAR query analysis.
        Proc AMIA Symp. 1999; : 892-896
        • Verma R.
        • Harper J.
        Life cycle of a data warehousing project in healthcare.
        J Healthc Inf Manag. 2001; 15: 107-117
        • Berndt D.J.
        • Hevner A.R.
        • Studnicki J.
        The catch data warehouse: support for community health care decision-making.
        Decision Support Systems. 2003; 35: 367-384
      1. Sittig DF, Pappas J, Rubalcaba P. Building and using a clinical data repository. Available at: http://www.informatics-review.com/thoughts/cdr.html. Accessed April 23, 2007.

        • Kimball R.
        The data warehouse toolkit.
        John Wiley & Sons, Inc., New York, NY1996
        • McNamee L.A.
        • Launsby B.D.
        • Frisse M.E.
        • et al.
        Scaling an expert system data mart: more facilities in real-time.
        Proc AMIA Symp. 1998; : 498-502
        • Brandt C.A.
        • Morse R.
        • Matthews K.
        • et al.
        Metadata-driven creation of data marts from an EAV-modeled clinical research database.
        Int J Med Inform. 2002; 65: 225-241
        • Rob P.
        • Coronel C.
        Database systems: design, implementation, and management.
        7th edition. Thomson/Course Technology, Boston2007
        • Inmon W.H.
        Building the data warehouse.
        4th edition. Wiley, Indianapolis (IN)2005
        • Prather J.C.
        • Lobach D.F.
        • Goodwin L.K.
        • et al.
        Medical data mining: knowledge discovery in a clinical data warehouse.
        Proc AMIA Symp. 1997; : 101-105
        • Goodwin L.K.
        • Iannacchione M.A.
        Data mining methods for improving birth outcomes prediction.
        Outcomes Manag. 2002; 6: 80-85
        • Cao H.
        • Markatou M.
        • Melton G.B.
        • et al.
        Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.
        Proc AMIA Symp. 2005; : 106-110
        • Mullins I.M.
        • Siadaty M.S.
        • Lyman J.
        • et al.
        Data mining and clinical data repositories: insights from a 667,000 patient data set.
        Comput Biol Med. 2006; 36: 1351-1377
        • Humphries K.H.
        • Rankin J.M.
        • Carere R.G.
        • et al.
        Co-morbidity data in outcomes research: are clinical data derived from administrative databases a reliable alternative to chart review?.
        J Clin Epidemiol. 2000; 53: 343-349
        • Iezzoni L.I.
        Assessing quality using administrative data.
        Ann Intern Med. 1997; 127: 666-674
        • Gorla N.
        Features to consider in a data warehousing system.
        Commun ACM. 2003; 46: 111-115
      2. Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. VLDB'98, Proceedings of the twenty fourth International Conference on Very Large Data Bases; 1998. p. 194–205.

        • Levene M.
        • Loizou G.
        Why is the snowflake schema a good data warehouse design?.
        Information Systems. 2003; 28: 225-240
        • Nadkarni P.M.
        • Brandt C.
        Data extraction and ad hoc query of an entity-attribute-value database.
        J Am Med Inform Assoc. 1998; 5: 511-527
        • Breen C.
        • Rodrigues L.M.
        Implementing a data warehouse at Inglis Innovative Services.
        J Healthc Inf Manag. 2001; 15: 87-97
        • Murphy S.N.
        • Gainer V.
        • Chueh H.C.
        A visual interface designed for novice users to find research patient cohorts in a large biomedical database.
        Proc AMIA Symp. 2003; : 489-493
        • Ledbetter C.S.
        • Morgan M.W.
        Toward best practice: leveraging the electronic patient record as a clinical data warehouse.
        J Healthc Inf Manag. 2001; 15: 119-131
        • Nigrin D.J.
        • Kohane I.S.
        Scaling a data retrieval and mining application to the enterprise-wide level.
        Proc AMIA Symp. 1999; : 901-905
        • Corwin J.
        • Silberschatz A.
        • Miller P.L.
        • et al.
        Dynamic tables: an architecture for managing evolving, heterogeneous biomedical data in relational database management systems.
        J Am Med Inform Assoc. 2007; 14: 86-93
        • Lyman J.A.
        • Scully K.
        • Tropello S.
        • et al.
        Mapping from a clinical data warehouse to the HL7 reference information model.
        Proc AMIA Symp. 2003; : 920
      3. HL7. Health level 7. Available at: http://www.hl7.org. Accessed June 15, 2007.

        • Nardon F.B.
        • Moura L.A.
        Knowledge sharing and information integration in healthcare using ontologies and deductive databases.
        Medinfo. 2004; 11: 62-66
        • Khan A.N.
        • Griffith S.P.
        • Moore C.
        • et al.
        Standardizing laboratory data by mapping to LOINC.
        J Am Med Inform Assoc. 2006; 13: 353-355
        • Berman J.J.
        Concept-match medical data scrubbing. How pathology text can be used in research.
        Arch Pathol Lab Med. 2003; 127: 680-686
      4. National Institutes of Health. Clinical research and the HIPAA privacy rule. Available at: http://privacyruleandresearch.nih.gov/clin_research.asp. Accessed June 18, 2007.

        • Schell S.R.
        Creation of clinical research databases in the 21st century: a practical algorithm for HIPAA compliance.
        Surg Infect (Larchmt). 2006; 7: 37-44
        • El Emam K.
        • Jabbouri S.
        • Sams S.
        • et al.
        Evaluating common de-identification heuristics for personal health information.
        J Med Internet Res. 2006; 8: E28
        • Gupta D.
        • Saul M.
        • Gilbertson J.
        Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research.
        Am J Clin Pathol. 2004; 121: 176-186
        • Beckwith B.A.
        • Mahaadevan R.
        • Balis U.J.
        • et al.
        Development and evaluation of an open source software tool for deidentification of pathology reports.
        BMC Med Inform Decis Mak. 2006; 6: 12-21
        • Nadkarni P.
        • Chen R.
        • Brandt C.
        UMLS concept indexing for production databases: a feasibility study.
        J Am Med Inform Assoc. 2001; 8: 80-91
        • Hazlehurst B.
        • Frost H.R.
        • Sittig D.F.
        • et al.
        MediClass: a system for detecting and classifying encounter-based clinical events in any electronic medical record.
        J Am Med Inform Assoc. 2005; 12: 517-529
        • McDonald C.J.
        • Dexter P.
        • Schadow G.
        • et al.
        SPIN query tools for de-identified research on a humongous database.
        Proc AMIA Symp. 2005; : 515-519