NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.
NSF
In the era of big data, researchers and analysts have unprecedented access to extensive datasets, opening up new opportunities for scientific discovery and predictive modeling. These datasets, often collected from different sources, contain various forms of heterogeneity, such as distribution heterogeneity, observation heterogeneity, and task heterogeneity. Effectively addressing these complexities is critical for maximizing their potential. This project tackles various sources of data integration in supervised learning by introducing a novel framework, the Representation Retrieval (R2) framework, which simultaneously addresses all three types of heterogeneity. The new framework combines advanced representation learning with sparse-induced machine learning algorithms to achieve its objectives. Additionally, the investigator develops a new integrative penalty designed to improve the integration and effectiveness of the learned representations. This project will lead to a new paradigm for extracting and integrating information from heterogeneous data sources, providing fundamental solutions and a rigorous framework to address these challenges. This project will have broad applications across various fields, including medical and health sciences, social sciences, political science, education, finance, marketing, and artificial intelligence. The investigator will integrate education with research by developing a new course on data integration. The new framework extracts a shared representation dictionary from multiple heterogeneous data sources, then selects multiple source-specific retrievers, and estimates source-specific learners. The representation dictionary, accessible to all data sources, contains a set of representers which can represent covariates in a low-dimensional latent space. Each data source employs its own retrievers to select informative representers which project their covariates into an appropriate latent space. Using these retrieved representations, source-specific learners can then be applied to predict responses. The project aims to make significant contributions to the field of data integration through the following advancements: (1) Formulating a General Data Integration Framework: Define a supervised learning problem that incorporates three types of heterogeneity. This formulation generalizes several well-studied problem setups as special cases. (2) Introducing the Representation Retrieval (R2) Framework: Develop a comprehensive framework to address all three types of heterogeneity and overcome common limitations in existing methods. (3) Addressing Distribution Heterogeneity: Leverage a “partially-sharing structure" to model distribution heterogeneity effectively, solve optimization problems with sparsity-induced penalties, and introduce a novel “Selective Integration Penalty" to encourage representers shared across multiple data sources. (4) Handling Observation Heterogeneity: Propose a non-imputation-based approach to manage observation heterogeneity, providing a robust alternative to conventional methods. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $175K
2028-07-31
Detailed requirements not yet analyzed
Have the NOFO? Paste it below for AI-powered requirement analysis.
One-time $49 fee · Includes AI drafting + templates + PDF export
Research Infrastructure: National Geophysical Facility (NGF): Advancing Earth Science Capabilities through Innovation - EAR Scope
NSF — up to $26.6M
AmLight: The Next Frontier Towards Discovery in the Americas and Africa
NSF — up to $9M
CREST Phase II Center for Complex Materials Design
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Energy Technologies
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Post-Transcriptional Regulation
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Semiconductors Research
NSF — up to $7.5M