Review Article| Volume 28, ISSUE 1, P9-35, March 2008

Introduction to Data Mining for Medical Informatics

      Data mining consists of a series of techniques for the discovery of patterns in large databases. This article provides an introduction to common data mining techniques with a view toward their use. The article begins by describing methods for discovering and exploring associations in observations and variables. The discussion then turns to methods for prediction. These techniques discover relationships between sets of variables. The article concludes with a description of evaluative techniques that are useful for assessing the results from data mining.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.


      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Kahneman D.
        • Slovic P.
        • Tversky A.
        Judgment under uncertainty: heuristics and biases.
        Cambridge University Press, Cambridge (UK)1982
        • Hand D.
        • Mannila H.
        • Smyth P.
        Principles of data mining.
        MIT Press, Cambridge (MA)2001
      1. Wold H. Soft modeling by latent variables: the nonlinear iterative partial least squares (NIPALS) approach. Perspectives in probability and statistics, In Honor of MS Bartlett. Sheffield, UK: Applied Probability Trust; 1975. p. 117–44.

        • Hoerl A.E.
        • Kennard R.
        Ridge regression: biased estimation for nonorthogonal problems.
        Technometrics. 1964; 12: 55-67
        • Comon P.
        Independent component analysis, a new concept?.
        Technometrics. 1994; 36: 287-314
        • Copas J.B.
        Regression, prediction and shrikage (with discussion).
        J Roy Stat Soc B. 1983; 45: 31-354
        • Agrawal R.
        • Mannila H.
        • Srikant R.
        • et al.
        Fast discovery of association rules.
        in: Fayyad U.M. Pietsky-Shapiro G. Smyth P. Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge (MA)1996: 307-328
        • Breiman L.
        • Friedman J.
        • Olshen R.
        • et al.
        Classification and regression trees.
        Wadsworth, Belmont (CA)1984
        • Kass G.V.
        An exploratory technique for investigating large quantities of categorical data.
        Appl Stat. 1980; 29: 119-127
      2. Brown DE, Pittard CL. Classification trees with optimal multi-variate splits. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Le Touquet (France); 1993. p. 475–8.

        • Freund Y.
        • Schapire R.
        A decision theoretic generalization of online learning and an application to boosting.
        J of Comp & Sys Sci. 1997; 55: 119-139
        • Breiman L.
        Random forests.
        Mach Learn. 2001; 45: 5-32
        • Vapnick V.
        The nature of statistical learning theory.
        Springer Verlag, New York1995