Review Article| Volume 43, ISSUE 1, P29-46, March 2023

Download started.


Clinical Artificial Intelligence

Design Principles and Fallacies
Published:December 13, 2022DOI:


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribers receive full online access to your subscription and archive of back issues up to and including 2002.

      Content published before 2002 is available via pay-per-view purchase only.


      Subscribe to Clinics in Laboratory Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Yu K.H.
        • Beam A.L.
        • Kohane I.S.
        Artificial intelligence in healthcare.
        Nat Biomed Eng. 2018; 2: 719-731
        • Ghassemi M.
        • Naumann T.
        • Schulam P.
        • et al.
        A review of challenges and opportunities in machine learning for health.
        AMIA Summits Transl Sci Proc. 2020; 2020: 191
        • Kelly C.J.
        • Karthikesalingam A.
        • Suleyman M.
        • et al.
        Key challenges for delivering clinical impact with artificial intelligence.
        BMC Med. 2019; 17: 195
        • Davenport T.
        • Kalakota R.
        The potential for artificial intelligence in healthcare.
        Future Healthc J. 2019; 6: 94-98
        • Miotto R.
        • Wang F.
        • Wang S.
        • et al.
        Deep learning for healthcare: review, opportunities and challenges.
        Brief Bioinform. 2018; 19: 1236-1246
        • Wiens J.
        • Saria S.
        • Sendak M.
        • et al.
        Do no harm: a roadmap for responsible machine learning for health care.
        Nat Med. 2019; 25: 1337-1340
        • Obermeyer Z.
        • Powers B.
        • Vogeli C.
        • et al.
        Dissecting racial bias in an algorithm used to manage the health of populations.
        Science. 2019; 366: 447-453
        • Ghassemi M.
        • Mohamed S.
        Machine learning and health need better values.
        NPJ Digital Med. 2022; 5: 1-4
        • Arbet J.
        • Brokamp C.
        • Meinzen-Derr J.
        • et al.
        Lessons and tips for designing a machine learning study using EHR data.
        J Clin Translational Sci. 2021; 5
        • Shen L.
        • Kann B.H.
        • Taylor R.A.
        • et al.
        The clinician’s guide to the machine learning galaxy.
        Front Physiol. 2021; 12: 658583
        • Esteva A.
        • Robicquet A.
        • Ramsundar B.
        • et al.
        A guide to deep learning in healthcare.
        Nat Med. 2019; 25: 24-29
        • Rowe M.
        An introduction to machine learning for clinicians.
        Acad Med. 2019; 94: 1433-1436
      1. Fundamentals of machine learning for healthcare. Coursera.
        (Available at:) (Accessed June 10, 2022)
      2. AI in healthcare. Coursera.
        (Available at:) (Accessed June 10, 2022)
      3. Ahmad MA, Eckert C, Teredesai A. Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. BCB ’18. Association for Computing Machinery; 2018:559–560.

        • Smith G.B.
        • Redfern O.C.
        • Pimentel M.A.F.
        • et al.
        The national early warning score 2 (NEWS2). Clinical medicine.
        J R Coll Physicians Lond. 2019; 19: 260
        • Nayyar A.
        • Gadhavi L.
        • Zaman N.
        Machine learning in healthcare: review, opportunities and challenges.
        Machine Learn Internet Med Things Healthc. 2021; : 23-45
      4. Shailaja K, Seetharamulu B, Jabbar MA. Machine learning in healthcare: a review. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE; 2018:910–914.

        • Varoquaux G.
        • Cheplygina V.
        Machine learning for medical imaging: methodological failures and recommendations for the future.
        NPJ digital Med. 2022; 5: 1-8
        • Zhou S.K.
        • Greenspan H.
        • Davatzikos C.
        • et al.
        A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises.
        Proc IEEE. 2021; 109: 820-838
        • Aggarwal R.
        • Sounderajah V.
        • Martin G.
        • et al.
        Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis.
        NPJ digital Med. 2021; 4: 1-23
      5. Irvin J., Rajpurkar P., Ko M., et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI. Vol 33, 1/27/2019 - 2/1/2019, 590–597.

        • Johnson A.E.W.
        • Pollard T.J.
        • Shen L.
        • et al.
        MIMIC-III, a freely accessible critical care database.
        Scientific data. 2016; 3: 1-9
      6. Wang X., Peng Y., Lu L., et al. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI. 7/22/2017 - 7/25/2017, 2097–2106.

        • Rajpurkar P.
        • Irvin J.
        • Zhu K.
        • et al.
        Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning.
        arXiv. 2017;
        • Allaouzi I.
        • Ahmed M.B.
        A novel approach for multi-label chest X-ray classification of common thorax diseases.
        IEEE Access. 2019; 7: 64279-64288
        • Seyyed-Kalantari L.
        • Zhang H.
        • McDermott M.
        • et al.
        Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.
        Nat Med. 2021; 27: 2176-2182
      7. Seyyed-Kalantari L., Liu G., McDermott M., et al. CheXclusion: fairness gaps in deep chest X-ray classifiers. In: BIOCOMPUTING 2021: Proceedings of the pacific Symposium. World Scientific; 2020:232–243. Availabe at:

        • Gichoya J.W.
        • Banerjee I.
        • Bhimireddy A.R.
        • et al.
        AI recognition of patient race in medical imaging: a modelling study.
        The Lancet Digital Health. 2022; 4: E406-E414
        • Tsiknakis N.
        • Theodoropoulos D.
        • Manikis G.
        • et al.
        Deep learning for diabetic retinopathy detection and classification based on fundus images: a review.
        Comput Biol Med. 2021; 135: 104599
      8. Beede E, Baylor E, Hersch F, et al. A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ; 2020:1–12.

        • van Leeuwen K.G.
        • Schalekamp S.
        • Rutten M.J.
        • et al.
        Artificial intelligence in radiology: 100 commercially available products and their scientific evidence.
        Eur Radiol. 2021; 31: 3797-3804
        • Pollard T.J.
        • Johnson A.E.W.
        • Raffa J.D.
        • et al.
        The eICU Collaborative Research Database, a freely available multi-center database for critical care research.
        Scientific Data. 2018; 5: 1-13
      9. McDermott M., Yan T., Naumann T., et al. Semi-supervised biomedical translation with cycle wasserstein regression GANs. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA. Vol 32. 2/2/2018 - 2/7/2018.

      10. McDermott M., Nestor B., Kim E., et al. A comprehensive EHR timeseries pre-training benchmark. In: Proceedings of the Conference on Health, Inference, and Learning (Virtual). 4/8/2021 - 4/10/2021, 257–278.

      11. Suresh H, Hunt N, Johnson A, Celi LA, Szolovits P, Ghassemi M. Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference. PMLR; 2017:322–337.

        • Lipton Z.C.
        • Kale D.C.
        • Elkan C.
        • et al.
        Learning to diagnose with LSTM recurrent neural networks.
        arXiv. 2015;
      12. Yoon J, Jordon J, van der Schaar M. GAIN: Missing Data Imputation using generative adversarial nets. In: Dy JG, Krause A, eds Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol 80. Proceedings of Machine Learning Research. PMLR; 2018:5675-5684.

      13. Nestor B, McDermott MBA, Boag W, et al. Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. In: Doshi-Velez F, Fackler J, Jung K, et al., eds Proceedings of the 4th Machine Learning for Healthcare Conference. Vol 106. Proceedings of Machine Learning Research. PMLR; 09–10 Aug 2019:381–405.

        • Chen I.
        • Johansson F.D.
        • Sontag D.
        Why is my classifier discriminatory?.
        in: Advances in neural information processing systems. 31. Curran Associates, Inc, 2018 (Available at:)
        • Chen I.Y.
        • Pierson E.
        • Rose S.
        • et al.
        Ethical machine learning in healthcare.
        Annu Rev Biomed Data Sci. 2021; 4: 123-144
      14. Futoma J, Hariharan S, Heller K, et al. An improved multi-output gaussian process rnn with real-time validation for early sepsis detection. In: Machine Learning for Healthcare Conference. PMLR; 2017:243–254.

      15. Futoma J, Hariharan S, Heller K. Learning to detect sepsis with a multitask Gaussian process RNN classifier. In: International Conference on Machine Learning. PMLR; 2017:1174–1182.

        • Lin A.L.
        • Sendak M.
        • Bedoya A.
        • et al.
        What is sepsis: investigating the heterogeneity of patient populations captured by different sepsis definitions.
        in: B43. Critical care: i still haven’t found what i'm looking for-identifying and managing sepsis. American Thoracic Society, 2018: A3299
        • Sendak M.P.
        • Ratliff W.
        • Sarro D.
        • et al.
        Real-world integration of a sepsis deep learning technology into routine clinical care: implementation study.
        JMIR Med Inform. 2020; 8: e15182
        • Granlund T.
        • Stirbu V.
        • Mikkonen T.
        Towards regulatory-compliant mlops: oravizio’s journey from a machine learning experiment to a deployed certified medical product.
        SN Computer Sci. 2021; 2: 342
        • El-Bouri R.
        • Taylor T.
        • Youssef A.
        • et al.
        Machine learning in patient flow: a review.
        Prog Biomed Eng. 2021; 3: 022002
        • Stone K.
        • Zwiggelaar R.
        • Jones P.
        • et al.
        A systematic review of the prediction of hospital length of stay: towards a unified framework.
        PLoS Digital Health. 2022; 1: e0000017
      16. How we revolutionize the operational management of hospitals with Calyps AI. CALYPS.
        (Available at:) (Accessed May 16, 2022)
        • Healthcare Becker’s
        Qventus. How Boston Medical Center uses automation for early discharge planning. Becker’s Health IT.
        (Accessed May 16, 2022)
        • Wu S.
        • Roberts K.
        • Datta S.
        • et al.
        Deep learning in clinical natural language processing: a methodical review.
        J Am Med Inform Assoc. 2020; 27: 457-470
        • Spasic I.
        • Nenadic G.
        Others. Clinical text data in machine learning: systematic review.
        JMIR Med Inform. 2020; 8: e17984
        • Le Glaz A.
        • Haralambous Y.
        • Kim-Dufor D.H.
        • et al.
        Machine learning and natural language processing in mental health: systematic review.
        J Med Internet Res. 2021; 23: e15708
        • Henry S.
        • Wang Y.
        • Shen F.
        • et al.
        The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.
        J Am Med Inform Assoc. 2020; 27: 1529-1537
        • Smit A.
        • Jain S.
        • Rajpurkar P.
        • et al.
        Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT.
        Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020; 117: 1500-1519
      17. McDermott MBA, Hsu TMH, Weng WH, Ghassemi M, Szolovits P. CheXpert++: approximating the CheXpert labeler for speed, differentiability, and probabilistic output. In: Doshi-Velez F, Fackler J, Jung K, et al., eds Proceedings of the 5th Machine Learning for Healthcare Conference. Vol 126. Proceedings of Machine Learning Research. PMLR; 07–08 Aug 2020:913–927.

        • Chauhan G.
        • McDermott M.
        • Szolovits P.
        Reflex: flexible framework for relation extraction in multiple domains.
        Proceedings of the 18th BioNLP Workshop and Shared Task. 2019; W19-5004: 30-47
      18. Roy A, Pan S. Incorporating medical knowledge in BERT for clinical relation extraction. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. ; 2021:5357–5366.

      19. Wei Q, Ji Z, Si Y, et al. Relation extraction from clinical narratives using pre-trained language models. In: AMIA Annual Symposium Proceedings. Vol 2019. American Medical Informatics Association; 2019:1236.

        • Sun W.
        • Rumshisky A.
        • Uzuner O.
        Evaluating temporal relations in clinical text: 2012 i2b2 challenge.
        J Am Med Inform Assoc. 2013; 20: 806-813
        • Brown T.
        • Mann B.
        • Ryder N.
        • et al.
        Language models are few-shot learners.
        Adv Neural Inf Process Syst. 2020; 33: 1877-1901
        • Sanh V.
        • Webson A.
        • Raffel C.
        • et al.
        Multitask prompted training enables zero-shot task generalization.
        Proceedings of the International Conference on Learning Representations. 2022; (Available at:)
        • Liu G.
        • Hsu T.M.H.
        • McDermott M.
        • et al.
        Clinically accurate chest x-ray report generation. In: machine Learning for Healthcare Conference.
        PMLR. 2019; 106: 249-269
        • Alfarghaly O.
        • Khaled R.
        • Elkorany A.
        • et al.
        Automated radiology report generation using conditioned transformers.
        Inform Med Unlocked. 2021; 24: 100557
        • Pivovarov R.
        • Elhadad N.
        Automated methods for the summarization of electronic health records.
        J Am Med Inform Assoc. 2015; 22: 938-947
      20. Liang J, Tsou CH, Poddar A. A novel system for extractive clinical note summarization using EHR data. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. ; 2019:46–54.

      21. Abacha AB, M’rabet Y, Zhang Y, Shivade C, Langlotz C, Demner-Fushman D. Overview of the mediqa 2021 shared task on summarization in the medical domain. In: Proceedings of the 20th Workshop on Biomedical Language Processing. ; 2021:74–85.

        • Pampari A.
        • Raghavan P.
        • Liang J.
        • et al.
        emrqa: a large corpus for question answering on electronic medical records.
        Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018; D18-1258: 2357-2368
      22. Weng WH, Chung YA, Szolovits P. Unsupervised clinical language translation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ; 2019:3121–3131.

        • Weizenbaum J.
        ELIZA—a computer program for the study of natural language communication between man and machine.
        Commun ACM. 1966; 9: 36-45
      23. The medical futurist. The top 12 healthcare chatbots. The medical futurist.
        (Available at:) (Accessed May 18, 2022)
        • Merrill M.A.
        • Althoff T.
        Transformer-based behavioral representation learning enables transfer learning for mobile sensing in small datasets.
        arXiv. 2021; (Available at:)
        • Wynants L.
        • Van Calster B.
        • Collins G.S.
        • et al.
        Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.
        BMJ. 2020; 369: m1328
        • Roberts M.
        • Driggs D.
        • Thorpe M.
        • et al.
        Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans.
        Nat Machine Intelligence. 2021; 3: 199-217
      24. Gong JJ, Naumann T, Szolovits P, Guttag JV. Predicting clinical outcomes across changing electronic health record systems. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2017:1497–1505.

        • Lazer D.
        • Kennedy R.
        • King G.
        • et al.
        The parable of google flu: traps in big data analysis.
        Science. 2014; 343: 1203-1205
        • Beaulieu-Jones B.K.
        • Yuan W.
        • Brat G.A.
        • et al.
        Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?.
        npj Digital Med. 2021; 4: 62
      25. Adam GA, Chang CHK, Haibe-Kains B, Goldenberg A. Hidden risks of machine learning applied to healthcare: unintended feedback loops between models and future data causing model degradation. In: Doshi-Velez F, Fackler J, Jung K, et al., eds Proceedings of the 5th Machine Learning for Healthcare Conference. Vol 126. Proceedings of Machine Learning Research. PMLR; 07–08 Aug 2020:710–731.

      26. Subbaswamy A, Schulam P, Saria S. Preventing failures due to dataset shift: learning predictive models that transport. In: Chaudhuri K, Sugiyama M, eds Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. Vol 89. Proceedings of Machine Learning Research. PMLR; 16–18 Apr 2019:3118–3127.

        • Rajkomar A.
        • Oren E.
        • Chen K.
        • et al.
        Scalable and accurate deep learning with electronic health records.
        npj Digital Med. 2018; 1: 18
        • Curth A.
        • Thoral P.
        • van den Wildenberg W.
        • et al.
        Transferring clinical prediction models across hospitals and electronic health record systems.
        in: Cellier P. Driessens K. Machine learning and knowledge discovery in databases. Springer International Publishing, 2020: 605-621
        • Oakden-Rayner L.
        • Dunnmon J.
        • Carneiro G.
        • et al.
        Hidden stratification causes clinically meaningful failures in machine learning for medical imaging.
        CoRR. 2019; (abs/1909.12475. Available at:)
      27. Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ; 2015:1721–1730.

        • Cooper G.F.
        • Abraham V.
        • Aliferis C.F.
        • et al.
        Predicting dire outcomes of patients with community acquired pneumonia.
        J Biomed Inform. 2005; 38: 347-366
      28. Zhang H, Lu AX, Abdalla M, McDermott M, Ghassemi M. Hurtful words: quantifying biases in clinical contextual word embeddings. In: Proceedings of the ACM Conference on Health, Inference, and Learning. ; 2020:110–120.

        • Pierson E.
        • Cutler D.M.
        • Leskovec J.
        • et al.
        An algorithmic approach to reducing unexplained pain disparities in underserved populations.
        Nat Med. 2021; 27: 136-140
        • Hall M.
        • van der Maaten L.
        • Gustafson L.
        • et al.
        A systematic study of bias amplification.
        arXiv. 2022; 2201: 11706
        • Vyas D.A.
        • Eisenstein L.G.
        • Jones D.S.
        Hidden in plain sight — reconsidering the use of race correction in clinical algorithms.
        N Engl J Med. 2020; 383: 874-882
        • McDermott M.B.A.
        • Wang S.
        • Marinsek N.
        • et al.
        Reproducibility in machine learning for health research: still a ways to go.
        Sci Transl Med. 2021; 13: eabb1655
        • Oala L.
        • Murchison A.G.
        • Balachandran P.
        • et al.
        Machine learning for health: algorithm auditing & quality control.
        J Med Syst. 2021; 45: 105
        • Vellido A.
        The importance of interpretability and visualization in machine learning for applications in medicine and health care.
        Neural Comput Appl. 2020; 32: 18069-18083
        • Yoon C.H.
        • Torrance R.
        • Scheinerman N.
        Machine learning in medicine: should the pursuit of enhanced interpretability be abandoned?.
        J Med Ethics. 2021; 48: 581-585
        • Stiglic G.
        • Kocbek P.
        • Fijacko N.
        • et al.
        Interpretability of machine learning-based prediction models in healthcare.
        Wiley Interdiscip Rev Data Min Knowl Discov. 2020; 10: e1379
        • Jin D.
        • Sergeeva E.
        • Weng W.H.
        • et al.
        Explainable deep learning in healthcare: a methodological survey from an attribution view.
        Wires Mech Dis. 2022; 14: e1548
        • Lipton Z.C.
        The mythos of model interpretability.
        CoRR. 2016; (abs/1606.03490. Available at:)
      29. Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. What clinicians want: contextualizing explainable machine learning for clinical end use. In: Doshi-Velez F, Fackler J, Jung K, et al., eds Proceedings of the 4th Machine Learning for Healthcare Conference. Vol 106. Proceedings of Machine Learning Research. PMLR; 09–10 Aug 2019:359–380.

        • Ghassemi M.
        • Oakden-Rayner L.
        • Beam A.L.
        The false hope of current approaches to explainable artificial intelligence in health care.
        Lancet Digital Health. 2021; 3: e745-e750
      30. Poursabzi-Sangdeh F, Goldstein DG, et al. Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ; 2021:1–52.