NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

Collaborative Research: SHF: Small: Building scalable GPU simulation and efficient GPU memory management for large machine learning acceleration

NSF

open

Recent advancements in large machine learning models have demonstrated that increasing the number of parameters enhances computational precision and unlocks capabilities once deemed unattainable. This trend is exemplified by the rapid growth in model sizes, for instance, GPT-3 contained 175 billion parameters, while GPT-4 reportedly utilizes up to 1.8 trillion. This trajectory is expected to continue in the foreseeable future. However, the explosive growth in model size presents two major challenges for computer architecture and systems research: prolonged simulation times, which can extend from several days to weeks for large-scale models, and infeasibility of deploying workloads on a single compute engine (e.g., a graphics processing unit (GPU)) due to limited on-device memory capacity. To address these challenges, this project proposes the development of scalable simulation techniques and advanced memory management strategies tailored for large-scale machine learning workloads on GPUs. Unlike existing application-agnostic approaches, this research will leverage the distinctive data access patterns and value distributions of modern machine learning models to enable more efficient memory compression and more accurate simulation acceleration. While the primary focus will be on emerging machine learning models, the broader objective is to advance GPU computing to better accommodate any big data workload constrained by memory limitations. This will facilitate faster and broader adoption of GPUs across diverse computing domains, driving continued innovation in computational science. The outcomes of this research will be integrated into both new and existing undergraduate and graduate curricula, as well as K-12 outreach initiatives, fostering a deeper understanding of cutting-edge computing technologies across educational levels. This project would answer two research questions: how to simulate large machine learning computing and how to utilize GPU local memory better when the memory is oversubscribed. While large-scale simulation and memory management have been widely studied, most existing approaches fail to capture the unique architectural characteristics of GPU computing and the specific behaviors of emerging machine learning workloads. Rather than relying on application-agnostic or user-dependent sampling techniques, this research will exploit the distinctive compute and memory access patterns inherent to machine learning models. The first thrust will research efficient simulator acceleration methodology by leveraging the fact that machine learning models are typically executed with highly optimized library functions. These library functions tend to have similar architectural behaviors depending on the operational and data size characteristics. The project will identify representative sample kernels whose performance can be extrapolated to other similar kernels, thereby significantly reducing simulation overhead. By leveraging characteristics of the library functions, the second thrust will explore efficient memory expansion and compression strategies such as dynamic memory prefetching and eviction policies to mitigate the effects of memory oversubscription. The second thrust will develop novel quantization techniques that take advantage of the unique value distributions of weights and gradients within individual tensors. Unlike tensor-oblivious methods, this targeted approach aims to reduce memory footprint more effectively while preserving model accuracy. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

machine learningeducation

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $270K

Deadline

2028-09-30

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Machine Learning Grants Education Grants