Review Article| Volume 28, ISSUE 1, P37-54, March 2008

Open-Source Tools for Data Mining

      With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.


      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


      1. Fayyad U. Piatetsky-Shapiro G. Smyth P. Advances in knowledge discovery and data mining. AAAI Press, Menlo Park (CA)1996
        • Quinlan J.R.
        C4.5: programs for machine learning.
        Morgan Kaufmann Publishers, San Mateo (CA)1993
        • Michalski R.S.
        • Kaufman K.
        Learning patterns in noisy data: the AQ approach.
        in: Paliouras G. Karkaletsis V. Spyropoulos C. Machine learning and its applications. Springer-Verlag, Berlin2001: 22-38
        • Clark P.
        • Niblett T.
        The CN2 induction algorithm.
        Machine Learning. 1989; 3: 261-283
      2. Asuncion A, Newman DJ. UCI Machine Learning Repository. Available at:∼mlearn/MLRepository.html. Accessed April 15, 2007. Irvine, CA: University of California, Department of Information and Computer Science; 2007.

        • Wall L.
        • Christiansen T.
        • Orwant J.
        Programming Perl.
        3rd edition. O'Reilly Media, Inc., Sebastopol, CA2000
        • Kohavi R.
        • Sommerfield D.
        • Dougherty J.
        Data mining using MLC++: a machine learning library in C++.
        International Journal on Artificial Intelligence Tools. 1997; 6: 537-566
      3. Brunk C, Kelly J, Kohavi R. MineSet: an integrated system for data mining. In Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Menlo Park (CA). p. 135–8.

        • Witten I.H.
        • Frank E.
        Data mining: practical machine learning tools and techniques with Java implementations.
        2nd edition. Morgan Kaufmann, San Francisco (CA)2005
        • Zupan B.
        • Holmes J.H.
        • Bellazzi R.
        Knowledge-based data analysis and interpretation.
        Artif Intell Med. 2006; 37: 163-165
        • Bellazzi R.
        • Zupan B.
        Predictive data mining in clinical medicine: current issues and guidelines.
        Int J Med Inform. 2006; (in press)
        • Cios K.J.
        • Moore G.W.
        Uniqueness of medical data mining.
        Artif Intell Med. 2002; 26: 1-24
        • Becker R.A.
        • Chambers J.M.
        S: an interactive environment for data analysis and graphics.
        Wadsworth & Brooks/Cole, Pacific Grove (CA)1984
        • Hoffman P.E.
        • Grinstein G.G.
        • Marx K.E.
        DNA visual and analytic data mining.
        In Proc. IEEE Visualization, Phoenix (AZ)1997 (p. 437–41)
        • Asimov D.
        The grand tour: a tool for viewing multidimensional data.
        SIAM J Sci Statist Comput. 1985; 6: 128-143