KAREN DRUKKER AND MARYELLEN GIGER, UNIVERSITY OF CHICAGO
The potential for bias and unfairness in the integration of AI in health care-centered research and diagnostics, particularly in medical image analysis, must be addressed to ensure equitable and effective outcomes for all patients. Biases can occur during the implementation of any of the five steps of the imaging AI model development pipeline, as our Medical Imaging and Data Resource Center (MIDRC) team from the University of Chicago and other institutions has pointed out. These steps include data collection, data preparation and annotation, model development, model evaluation, and model deployment in real-world health care settings.
To measure disparate impact, bias, and unfairness, appropriate metrics capturing differential performance across different groups are necessary.
Bias in health care can result in overdiagnoses or missed abnormalities of diseases within certain populations. For example, when the training data predominantly represents a specific demographic, this causes the AI model to poorly integrate the data from images from another demographic. This results in a disparate impact on the analysis. A second example of a potential pitfall is bias that is introduced during the process of data collection, curation, and annotation; this arises from subjective decisions that are made during the annotation process. At this point, human annotators may inadvertently introduce their own interpretations. A third example involves biases that can arise during the development of AI models, such as the choice of algorithms or even unintentional biases in the training process.
Historically marginalized communities, individuals from low socioeconomic backgrounds, the elderly, rare disease-inflicted patients, and underrepresented ethnic or racial groups are more likely to experience disparate impact in AI models. For example, an AI algorithm trained on data that is collected from patients of one race might perform differently on data collected from patients of another race.
To measure disparate impact, bias, and unfairness, appropriate metrics capturing differential performance across different groups are necessary. Selecting these metrics and interpreting their results requires careful consideration of the specific context; stakeholder input from doctors, patients, and researchers, and ethical considerations, such as providing transparency without compromising patient privacy, is also required. It is important to note that a single metric cannot comprehensively capture all aspects of bias and unfairness.
Mitigation strategies and best practices include collecting representative data sets, rigorously validating AI models on diverse data sets and patient populations, developing interpretable and transparent models, and encouraging collaboration among researchers, medical professionals, industry experts, and government agencies to address bias in AI for image analysis.
Government agencies such as the National Institutes of Health (NIH) fund research, promote collaborations, and support initiatives that address bias and fairness in AI for medical imaging. The MIDRC, funded by the National Institute of Biomedical Imaging and Bioengineering, is reputable for developing solutions for bias mitigation and equitable AI development. These solutions include establishing standards for the size and diversity of the data, open access and data-sharing principles, comprehensive data annotations, quality control and standardization measures, integration of clinical and imaging data, interoperability with other data commons, collaboration and partnership promotion, and research and development initiatives, including a dedicated bias and diversity working group.
Addressing bias in AI for health care requires a collective effort to advise researchers to avoid biases and enforce that they incorporate bias checks and mitigation methods in their reporting of findings and results. Regulatory bodies should continue to collaborate with industry and academia to develop guidelines and policies that ensure fairness and bias mitigation in medical AI applications.
Meet the authors
Karen Drukker, Ph.D., is a research associate professor of radiology at the University of Chicago, where she has been involved in medical imaging research for more than 20 years. She received her Ph.D. in physics from the University of Amsterdam. Her research interests include machine learning applications in the detection, diagnosis, and prognosis of disease, focusing on rigorous training/testing protocols, generalizability, performance evaluation, and bias and fairness of AI. She is a fellow of SPIE and AAPM; email: [email protected].
Maryellen Giger, Ph.D., is the A.N. Pritzker Distinguished Service Professor at the University of Chicago. Her research involves computer-aided diagnosis/machine learning in medical imaging for cancer, bone diseases, and now COVID-19 and is contact principal investigator on the NIBIB-funded Medical Imaging and Data Resource Center (MIDRC). She is a member of the NAE, a recipient of the AAPM Coolidge Gold Medal, SPIE Harrison H. Barrett Award, and RSNA Outstanding Researcher Award, and is a Fellow of AAPM, AIMBE, SPIE, and IEEE; email: [email protected].
The views expressed in ‘Biopinion’ are solely those of the authors and do not necessarily represent those of Photonics Media. To submit a Biopinion, send a few sentences outlining the proposed topic to [email protected]. Accepted submissions will be reviewed and edited for clarity, accuracy, length, and conformity to Photonics Media style.