CAREER: Expediting Next-Generation AI for Health via KG-LLM Co-Learning
openNSF
Large language models (LLMs) have reshaped artificial intelligence (AI) research and implementations. In the healthcare domain, extensive enthusiasm has been witnessed on the exploration of LLMs in answering medical questions, extracting clinical information, and assisting clinical decisions. Studies have also revealed the limitations of LLMs regarding their lack of knowledge, fuzzy inference, and hallucination. Knowledge graphs (KGs) have been widely studied due to their advantages in storing accurate, explicit and easily-modifiable knowledge. In the healthcare domain, various medical KGs have been used to support basic science research, pharmaceutical research, clinical decisions, and policy making. However, healthcare data are notoriously noisy and complex, where datasets about specific concepts and conditions come from various sources and include multiple data modalities. While pioneering studies have started to explore the combination of KGs and LLMs for healthcare, most of these studies have focused on applying existing methods and do not aim to fundamentally improve the KGs and LLMs towards solving various healthcare problems. In this project, the research team aims to comprehensively investigate and address the data, model and application challenges in healthcare, through Knowledge Graphs-Large Language Model Co-Learning (KG-LLM Co-learning), a systematic framework that will provide most (if not all) major functionality needed to build high-quality KGs that integrate complex healthcare data, enhance LLMs to obtain reliable healthcare models, and leverage multi-agent systems to account for data privacy, human values and broader factors in healthcare applications.
The proposed KG-LLM Co-learning framework includes several transformative technical innovations. First, the investigator's team will develop novel LLM-based methods for constructing comprehensive healthcare KGs, where they will study: (1) ontology-infused prompt designs for unifying existing healthcare KGs collectable from different sources, (2) structure-oriented retrieval augmented generation to continuously improve the healthcare KG based on evolving biomedical literature, and (3) alignment-based instruction tuning to further enrich the healthcare KG based on multi-modality patient data. Second, the investigator's team will design novel usage of KGs to obtain reliable healthcare LLMs, where they will focus on (1) enhancing the planning capability of LLMs through providing biomedical knowledge from KGs, (2) enhancing the reasoning capability of LLMs through enabling biomedical neural symbolic rule learning over KGs, and (3) enhancing the grounding capability of LLMs through enforcing post-hoc biomedical error detection based on KGs. Third, the investigator's team will explore a new way of integrating data and models through the collaboration of LLMs in a flexible federated multi-agent system, which can rigorously facilitate (1) protection of data privacy, (2) alignment with human values, and (3) consideration of broader health factors. Finally, the investigator's team will conduct comprehensive evaluations regarding the important healthcare applications of risk prediction, treatment suggestion and disease subtyping based on de-identified patient data from publicly available databases, Emory Hospitals and the NIH Bridge2AI CHoRUS Consortium, through collaborations with healthcare professionals and clinical experts. If successful, the studies will fill critical gaps between AI advances and healthcare, informing methodology and technology designs in data management and AI/Machine Learning communities, while potentially generating new biomedical and disease knowledge for improving healthcare.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $348K
machine learning