Advertisement
Review Article| Volume 28, ISSUE 1, P127-143, March 2008

Data Mining for Biomarker Development: A Review of Tissue Specificity Analysis

      Novel biomarker development requires a significant resource commitment to translate candidate markers into clinical assays. Consequently, it is imperative high quality candidates are selected early in a biomarker development program. High throughput gene expression data are routinely used to identify transcripts differentially expressed in diseased versus normal samples. Data-mining Expressed Sequence Tag, Serial Analysis of Gene Expression, Massively Parallel Signature Sequencing, and microarray expression databases can provide additional information on the expression of candidate biomarkers across multiple tissues, organs, and disease states. From this information, quantitative measures of tissue-specific gene specificity are computed and used to guide candidate biomarker selection.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.

      Subscribe:

      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

      1. National Cancer Institute 2007. The nation's investment in cancer research. A plan and budget proposal for fiscal year 2008. Pub. L. No. 92–218, NIH Publication No. 06-6090.

        • Batchelder K.
        • Miller P.
        A change in the market—investing in diagnostics.
        Nat Biotechnol. 2006; 24: 922-926
        • Ozdemir V.
        • Williams-Jones B.
        • Glatt S.
        • et al.
        Shifting emphasis from pharmacogenomics to theragnostics.
        Nat Biotechnol. 2006; 24: 942-946
        • Rifai N.
        • Gillette M.A.
        • Carr S.A.
        Protein biomarker discovery and validation: the long and uncertain path to clinical utility.
        Nat Biotechnol. 2006; 24: 971-983
        • Cho W.C.
        Contribution of oncoproteomics to cancer biomarker discovery.
        Mol Cancer. 2007; 6: 25
        • Bharti A.
        • Ma P.C.
        • Salgia R.
        Biomarker discovery in lung cancer-promises and challenges of clinical proteomics.
        Mass Spectrom Rev. 2007;
        • He Y.D.
        Genomic approach to biomarker identification and its recent applications.
        Cancer Biomark. 2006; 2: 103-133
        • Adams M.D.
        • Kellye J.M.
        • Gocayne J.D.
        • et al.
        Complementary DNA sequencing: expressed sequence tags and human genome project.
        Science. 1991; 252: 1651-1656
        • Velculescu V.E.
        • Zhang L.
        • Vogelstein B.
        • et al.
        Serial analysis of gene expression.
        Science. 1995; 270: 484-487
        • Brenner S.
        • Johnson M.
        • Bridgham J.
        • et al.
        Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.
        Nat Biotechnol. 2000; 18: 630-634
        • Jongeneel C.V.
        • Iseli C.
        • Stevenson B.J.
        • et al.
        Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing.
        Proc Natl Acad Sci U S A. 2003; 100: 4702-4705
        • Pontius J.U.
        • Wagner L.
        • Schuler G.D.
        UniGene: a unified view of the transcriptome.
        in: The NCBI handbook. National Center for Biotechnology Information, Bethesda (MD)2003
        • Boguski M.S.
        • Lowe T.M.
        • Tolstoshev C.M.
        dbEST–database for “expressed sequence tags”.
        Nat Genet. 1993; 4: 332-333
        • Adams M.D.
        • Kerlavage A.R.
        • Fields C.
        • et al.
        3,400 new expressed sequence tags identify diversity of transcripts in human brain.
        Nat Genet. 1993; 4: 256-267
        • Boon K.
        • Osorio E.C.
        • Greenhut S.F.
        • et al.
        An anatomy of normal and malignant gene expression.
        Proc Natl Acad Sci U S A. 2002; 99: 11287-11292
        • Beaty R.M.
        • Edwards J.B.
        • Boon K.
        • et al.
        PLXDC1 (TEM7) is identified in a genome-wide expression screen of glioblastoma endothelium.
        J Neurooncol. 2007; 81: 241-248
        • Jongeneel C.V.
        • Delorenzi M.
        • Iseli C.
        • et al.
        An atlas of human gene expression from Massively Parallel Signature Sequencing (MPSS).
        Genome Res. 2005; 15: 1007-1014
        • Su A.I.
        • Cooke M.P.
        • Ching K.A.
        • et al.
        Large-scale analysis of the human and mouse transcriptomes.
        Proc Natl Acad Sci U S A. 2002; 99: 4465-4470
        • Greller L.D.
        • Tobin F.L.
        Detecting selective expression of genes and proteins.
        Genome Res. 1999; 9: 282-296
        • Stekel D.J.
        • Git Y.
        • Falciani F.
        The comparison of gene expression from multiple cDNA libraries.
        Genome Res. 2000; 10: 2055-2061
        • Castensson A.
        • Emilsson L.
        • Preece P.
        • et al.
        High-resolution quantification of specific mRNA levels in human brain autopsies and biopsies.
        Genome Res. 2000; 10: 1219-1229
        • Lai C.
        • Chou C.
        • Ch'ang L.
        • et al.
        Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics.
        Genome Res. 2000; 10: 703-713
        • Walker M.G.
        • Volkmuth W.
        • Sprinzak E.
        • et al.
        Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes.
        Genome Res. 1999; 9: 1198-1203
        • Ewing R.M.
        • Kahla A.B.
        • Poirot O.
        • et al.
        Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression.
        Genome Res. 1999; 9: 950-959
        • Skrabanek L.
        • Campagne F.
        TissueInfo: high-throughput identification of tissue expression profiles and specificity.
        Nucleic Acids Res. 2001; 29: E102
        • Altschul S.F.
        • Gish W.
        • Miller W.
        • et al.
        Basic local alignment search tool.
        J Mol Biol. 1990; 215: 403-410
        • Zhang Z.
        • Schwartz S.
        • Wagner L.
        • et al.
        A greedy algorithm for aligning DNA sequences.
        J Comput Biol. 2000; 7: 203-214
        • Brown A.C.
        • Kai K.
        • May M.E.
        • et al.
        ExQuest, a novel method for displaying quantitative gene expression from ESTs.
        Genomics. 2004; 83: 528-539
        • Zhang Y.
        • Eberhard D.A.
        • Frantz G.D.
        • et al.
        GEPIS–quantitative gene expression profiling in normal and cancer tissues.
        Bioinformatics. 2004; 20: 2390-2398
        • Zhang Y.
        • Luoh S.M.
        • Hon L.
        • et al.
        GeneHub-GEPIS: digital expression profiling for normal and cancer tissues based on an integrated gene database.
        NAR. 2007; 35: W152-W158
        • Schug J.
        • Schuller W.P.
        • Kappen C.
        • et al.
        Promoter features related to tissue specificity as measured by Shannon Entropy.
        Genome Biol. 2005; 6: R33
      2. The Computational Biology and Informatics Laboratory. AllGenes: a Web site providing access to an integrated database of known and predicted human (release 9.0, 2004) and mouse genes (release 10.0, 2004). Center for Bioinformatics, University of Pennsylvania. Available at: http://www.allgenes.org. Accessed November 19, 2007.

        • Liang S.
        • Li Y.
        • Be X.
        • et al.
        Detecting and profiling tissue-selective genes.
        Physiol Genomics. 2006; 26: 158-162
        • Kadota K.
        • Nishimura S.
        • Bono H.
        • et al.
        Detection of genes with tissue-specific expression patterns using Akaike's information criterion procedure.
        Physiol Genomics. 2003; 12: 251-259
      3. Akaike H. Information theory and an extension of the maximum likelihood principle. Proc: 2nd Int symp information theory. Budapest; 1973. p. 267–81.

        • Miki R.
        • Kadota K.
        • Bono H.
        • et al.
        Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays.
        Proc Natl Acad Sci U S A. 2001; 98: 2199-2204
        • Kadota K.
        • Ye J.
        • Nakai Y.
        • et al.
        ROKU: a novel method for identification of tissue-specific genes.
        BMC Bioinformatics. 2006; 7: 294
        • Saito-Hisaminato A.
        • Katagiri T.
        • Kakiuchi S.
        • et al.
        Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray.
        DNA Res. 2002; 9: 35-45
        • Hsiao L.L.
        • Dangond F.
        • Yoshida T.
        • et al.
        A compendium of gene expression in normal human tissues.
        Physiol Genomics. 2001; 7: 95-96
        • Misra J.
        • Schmitt W.
        • Hwang D.
        • et al.
        Interactive exploration of microarray gene expression patterns in a reduced dimensional space.
        Genome Res. 2002; 12: 1112-1120
        • Vasmatzis G.
        • Klee E.
        • Kube D.M.
        • et al.
        Quantitating tissue specificity of human genes to facilitate biomarker discovery.
        Bioinformatics. 2007; 23: 1348-1355
        • Gupta S.
        • Vingron M.
        • Haas S.A.
        T-STAG: resource and Web-interface for tissue-specific transcripts and genes.
        Nucleic Acids Res. 2005; 33: W654-W658
        • Wang J.
        • Liang P.
        DigiNorthern, digital expression analysis of query genes based on ESTs.
        Bioinformatics. 2003; 19: 653-654
        • Madden S.F.
        • O'Donovan B.
        • Furney S.J.
        • et al.
        Digital extractor: analysis of digital differential display output.
        Bioinformatics. 2003; 19: 1594-1595
        • Huminiecki L.
        • Bicknell R.
        In silico cloning of novel endothelial-specific genes.
        Genome Res. 2000; 10: 1796-1806
        • Huminiecki L.
        • Lloyd A.T.
        • Wolfe K.H.
        Congruence of tissue expression profiles from gene expression atlas, SAGEmap and TissueInfo databases.
        BMC Genomics. 2003; 4: 31
        • Campagne F.
        • Skrabanek L.
        Mining expressed sequence tags identifies cancer markers of clinical interest.
        BMC Bioinformatics. 2006; 7: 481
        • Wang X.S.
        • Zhang Z.
        • Wang H.C.
        • et al.
        Rapid identification of UCA1 as a very sensitive and specific unique marker for human bladder carcinoma.
        Clin Cancer Res. 2006; 12: 4851-4858
        • Wang A.G.
        • Yoon S.Y.
        • Oh J.H.
        • et al.
        Identification of intrahepatic cholangiocarcinoma related genes by comparison with normal liver tissues using expressed sequence tags.
        Biochem Biophys Res Commun. 2006; 345: 1022-1032
        • Yoon S.Y.
        • Kim J.M.
        • Oh J.H.
        • et al.
        Gene expression profiling of human HBV- and/or HCV-associated hepatocellular carcinoma cells using expressed sequence tags.
        Int J Oncol. 2006; 29: 315-327
        • Huang Z.G.
        • Ran Z.H.
        • Lu W.
        • et al.
        Analysis of gene expression profile in colon cancer using the cancer genome anatomy project and RNA interference.
        Chin J Dig Dis. 2006; 7: 97-102
        • Aouacheria A.
        • Navratil V.
        • Barthelaix A.
        • et al.
        Bioinformatic screening of human ESTs for differentially expressed genes in normal and tumor tissues.
        BMC Genomics. 2006; 7: 94
        • Laterza O.F.
        • Modur V.R.
        • Crimmins D.L.
        • et al.
        Identification of novel brain biomarkers.
        Clin Chem. 2006; 52: 1713-1721
        • Asmann Y.W.
        • Kosari F.
        • Wang K.
        • et al.
        Identification of differentially expressed genes in normal and malignant prostate by electronic profiling of expressed sequence tags.
        Cancer Res. 2002; 62: 3308-3314
        • Megy K.
        • Audic S.
        • Claverie J.M.
        Heart-specific genes revealed by expressed sequence tag (EST) sampling.
        Genome Biol. 2002; 3 (RESEARCH0074:1–11)
        • Klee E.W.
        • Finlay J.
        • McDonald C.
        • et al.
        Bioinformatics methods for prioritizing serum biomarker candidates.
        Clin Chem. 2006; 52: 2162-2164