NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.
NSF
Graphics Processing Units (GPUs) are the go-to choice for deep learning due to their exceptional computational power and massive parallelism. However, maximizing GPU performance for model development and inference remains notoriously challenging as models grow increasingly complex, spanning multiple abstraction layers: the upstream Python layer, the midstream C/C++ layer, and the downstream GPU kernel layer. While this layered complexity meets diverse application needs, it also embeds inefficiencies that are difficult to detect due to intricate cross-layer interactions. The project addresses these inefficiencies through a comprehensive, cross-layer performance analysis of deep learning models. The project’s novelties are advancing state-of-the-art profiling techniques to enable systemic performance tuning across all layers. The project's broader significance and importance are deepening the understanding of systemic performance issues in deep learning, thus strengthening foundations in code analysis and advancing progress in fields increasingly reliant on deep learning, such as image processing. With interest from industry leaders like Meta, the project shows strong potential for translating academic insights into practical applications. Additionally, the project contributes to educational and outreach goals by integrating its findings into computer science curricula and K-12 programs to cultivate a workforce skilled in performance analysis and optimization. Three innovative analysis techniques structure the project. (1) Unified binary code analysis: It consolidates all layers of deep learning models into a shared binary abstraction, enabling the identification of cross-layer inefficiencies in code segments and data objects. (2) Incremental analysis: It incrementally narrows the scope of monitored performance metrics to pinpoint the root causes of inefficient code segments identified in the unified binary analysis. (3) Data object analysis: It addresses inefficient data objects identified in the unified binary analysis to diagnose their root causes. Together, these techniques form a comprehensive approach to performance tuning, addressing inefficiencies from a systemic perspective and maximizing GPU capabilities in deep learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $342K
2030-06-30
Detailed requirements not yet analyzed
Have the NOFO? Paste it below for AI-powered requirement analysis.
One-time $749 fee · Includes AI drafting + templates + PDF export
National STEM Teacher Corps Pilot Program: Rural Advancement of Students in STEM via Excellent Teacher Support: A Statewide Maine Alliance
NSF — up to $5M
NRT-IPP: Smart Construction, Infrastructure, and Buildings through Education, Research, and Cutting-edge Technology
NSF — up to $4.5M
AI Research Institute on Interaction for AI Assistants (ARIA)
NSF — up to $4M
FEC: Good Fire: Enhance Spatial and Temporal Efficacy of Prescribed Fire and Managed Wildfire Use
NSF — up to $4.0M
MRI: Track 2 Acquisition of a GPU-Accelerated Computing Cluster for Computationally Intensive and AI Research in North Dakota
NSF — up to $3.8M
TRAILBLAZER: Biomaterials for Programming Tissue Development
NSF — up to $3M