Insights
Opportunities and Challenges for the Application of Artificial Intelligence in Evaluations of HealthCare Services
We are an interdisciplinary group of students and researchers affiliated with the Patient Partnered Diagnostic Center of Excellence, part of the Patient-Oriented Research team at Michael Garron Hospital and the University of Toronto. Drawing on our diverse expertise in AI, engineering, data science, and health systems, we examine the opportunities and challenges that Artificial Intelligence (AI) presents in evaluating healthcare services and interventions.
The Institute for Healthcare Improvement’s Quintuple Aim outlines five key goals for healthcare systems: improving population health, enhancing the experiences of patients and providers, and advancing health equity. Evaluating healthcare practices is essential for measuring progress, identifying successes, and driving improvements. Healthcare evaluation is an iterative process that deepens our understanding of interventions and supports evidence-informed decision-making. This process includes five steps, as shown in Figure 1, leading to program modifications based on findings. However, this evaluation cycle can be time- and resource-intensive in an already strained healthcare system. AI, a rapidly evolving field, offers tools that simulate human intelligence and can support these evaluation processes. Our focus is on three key categories:
- Machine Learning (ML): Algorithms that learn from data and make predictions without requiring task-specific programming.
- Large Language Models (LLMs): A subset of ML designed for language processing tasks such as text summarization and translation.
- Generative AI: Extends the capabilities of ML and LLMs to create diverse types of content—such as text, images, and videos—based on user-provided prompts.
In this article, we explore the application of AI to the evaluation of healthcare services, highlighting its opportunities and limitations. We have structured the article to align with the evaluation cycle. While AI poses ethical and privacy challenges, these issues fall beyond the scope of this article.
Figure 1: The Evaluation Framework
Selected Opportunities and Challenges of AI
Stage 1: Defining the Evaluation Approach
Opportunity: The first step in selecting an evaluation approach is to identify the project’s stage of maturity and define the overarching evaluation question. LLMs are particularly useful at this stage, as they can assist in explaining, comparing, and suggesting suitable methods based on the intervention being evaluated.
Challenge: A significant challenge in applying AI at this stage is the "black box" nature of some AI models, which lack transparency in how they arrive at certain conclusions. This raises concerns about the validity of their recommendations.
Stages 2 and 3: Understanding the Intervention and Developing the Logic Model
Opportunity: Understanding the intervention involves gathering information from stakeholders, literature reviews, document analysis, and environmental scans. The next step, developing the logic model, focuses on creating a framework that outlines the intervention’s components and how they will lead to desired outcomes. Generative AI can support this process by synthesizing evidence, identifying relevant literature, summarizing documents, and developing questions for stakeholder engagement. It can also assist in identifying resources, activities, outputs, outcomes, and impacts, which are integral to developing the logic model.
Challenge: Evaluators applying AI at this stage must be aware of the potential for errors or hallucinations, in which AI presents false information as fact alongside the information synthesized or generated. For effective and reliable applications, users need a strong understanding of the evaluated processes to discern errors in the AI-generated recommendations.
Stages 4 and 5: Choosing Measures, Data Sources, and Analyzing Results
Opportunity: Choosing appropriate metrics and data sources is essential to evaluate interventions. AI-driven tools can streamline this process by analyzing large datasets, improving data precision, and reducing costs. During the analysis, feedback, and reporting stage, AI can uncover complex relationships within data that are often difficult for humans to detect and may require predefined statistical methods. Explainable AI tools can help make these relationships more transparent and interpretable, enabling further knowledge discovery. Generative AI also improves reporting by tailoring findings to stakeholder needs, creating visuals, and enhancing accessibility.
Challenge: A shared concern across both stages is the risk of biased training data. Underrepresentation of certain populations in datasets can perpetuate disparities and skew results. Additionally, many ML models emphasize correlation rather than causation, which requires careful interpretation. Since many models rely on retrospective data, they may fail to capture evolving trends, which poses challenges for maintaining long-term performance. To mitigate these issues, regular updates, rigorous validation, and interdisciplinary collaboration are essential. Engaging data scientists, healthcare professionals, and ethicists ensures a comprehensive approach to minimizing biases.
Equity and Partner Engagement
Equity and partner engagement are crucial considerations for healthcare evaluators. AI can enhance these efforts by refining criteria for underrepresented patient groups through improved data synthesis, collection, and visualization. However, to fully leverage AI’s potential, it is critical to address biases in historical data and apply an equity lens in the development of AI tools. This ensures that evaluation processes are inclusive and fair. Partnering with patients and incorporating their perspectives into the evaluation process helps ensure that interventions reflect their needs, priorities, and lived experiences. This fosters trust, promotes transparency, and helps create equitable outcomes tailored to the communities being served.
Conclusion
AI offers unprecedented opportunities to enhance healthcare service evaluation by enabling advanced data analysis, improving data visualization, generating insights through LLMs and Generative AI, and supporting tasks such as background research, logic model development, data analysis, and reporting. However, challenges such as limited contextual adaptability, black box nature, and potential biases underscore the need for skilled evaluators to guide the careful implementation of AI tools. Although this article highlights key opportunities and challenges, further research is needed to explore AI’s long-term implications, particularly its ability to address equity, reduce biases, and adapt to diverse contexts.
This project was funded under grant number R18HS029356 from the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services (HHS) (Miller, Smith, Giardina, PIs). The authors are solely responsible for this document’s contents, findings, and conclusions, which do not necessarily represent the views of AHRQ. Readers should not interpret any statement in this report as an official position of AHRQ or of HHS.
About the Author(s)
Lidia Mateus, MSc is a first-year doctoral student in the Health Systems Research program at the University of Toronto.
Yiyang Qu, MASc, is a research data analyst of the Patient-Oriented Research Team at Michael Garron Hospital.
Haoyan Zheng, is a Summer Scholar with the Patient-Oriented Research team at Michael Garron Hospital and an Engineering Science student at the University of Toronto. Haoyan Zheng | LinkedIn
Sara Shearkhani, PhD, is a research scientist at Michael Garron Hospital and East Toronto Health Partners, and an assistant professor (status) at the Institute of Health Policy, Management, and Evaluation of University of Toronto.
Comments
Be the first to comment on this!
Personal Subscriber? Sign In
Note: Please enter a display name. Your email address will not be publically displayed