M.I.J.O.: Framework for Evaluating Applicability and Limitations of Biomedical Datasets with AI Assistants

Title

Background

Dataset limitations bound both clinical and artificial intelligence (AI) models in biomedical research, yet assessing dataset applicability can be difficult. Team expertise may cover clinical practice and machine learning, but not epidemiology or other relevant domains. Researchers may further lack resources for tracking emergent limitations. The consequences are amplified when subpopulations can alter biological understanding and treatment strategies. For instance, oncogenic drivers differ between never-smokers and ever-smokers in lung cancer, yet ever-smokers dominate influential datasets and may cause poor generalization to non-smoking Asian women and other underrepresented groups.

Methods

We propose Model Integrity Joint Observability (M.I.J.O.), a framework for joint human-AI assessments of dataset applicability to research questions. We introduce a schema for encoding applicability alerts, a workflow artifact for creating them, and a decentralized protocol for publishing and discovery.

Results

As case studies, we assessed two influential lung cancer datasets against viral association and never-smoker research questions. Sequencing libraries employed by both datasets depleted non-polyadenylated viral RNAs, reducing sensitivity to Epstein-Barr virus, adenovirus, and other pathogens. Furthermore, one dataset contains an estimated 6 of 37 never-smoker samples (16%), meaning that even a true viral subtype with 39% prevalence could remain undetected.

Conclusion

Because subtypes may need different therapeutic strategies, our results suggest the need to re-evaluate viral association in lung cancer subtypes, particularly never-smokers. In general, AI-assisted applicability alerts can help biomedical and AI researchers align datasets with research questions and flag limitations, lowering the risk of misguided AI models, biological inferences, and clinical decisions.

Full Paper

M.I.J.O.: Framework for Evaluating Applicability and Limitations of Biomedical Datasets with AI Assistants