Project Summary
Machine learning (ML) algorithms have become important tools for leveraging healthcare data to improve patient outcomes and streamline hospital processes. Nevertheless, there remain concerns regarding the reliability of these algorithms because they can perform poorly in certain populations. ML algorithms can also become “outdated” and gradually decay in performance as the medical system evolves over time. When the performance of a ML algorithm does not match specifications, it is important to understand what the major reasons are. Once the main causes are identified, hospitals and data scientists can deploy strategies to close this performance gap, such as updating the ML algorithm and/or its input data.
There are currently no tools that provide sufficiently precise explanations for a performance gap. This project will develop new computational tools to address this methodological gap. The methods will be thoroughly validated through theoretical analyses, computer simulations, and evaluation on real-world datasets. The research team will regularly consult its stakeholder engagement team, which includes a clinical informatician, clinicians, regulatory experts, biostatistician, and a bioethicist. Open-source software for running the methods will be published online.