NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

Collaborative Research: RI: Small: Evaluation Concepts for Assessing and Improving Large Language Models

NSF

open

Systems based on modern large language models (LLMs) play an increasing role in how users access information and compose text. For instance, a user executing a web search will increasingly rely on LLM-based systems to summarize their search results, rather than viewing individual web pages, and they might use LLM-based systems to “talk to” long documents like financial reports, rather than reading them in their entirety. To support these new paradigms, it is important that an LLM be able to generate responses that are factual, informative and safe. However, satisfying these criteria is not sufficient: a response should also be at the right level of abstraction or detail, in the right format, creative where appropriate, and aligned with other user needs. Current practice has neglected evaluation of these more subtle factors. This project proposes to address these shortcomings by identifying a set of “evaluation concepts” to indicate the kinds of areas where LLMs are failing, like “lack of detail in a list.” The project will then develop technology for automatically evaluating and improving LLM responses according to these concepts. This project aims to improve the evaluation and the functionality of LLMs in two ways. First, the project will discover a concept taxonomy and learn how to evaluate LLM responses according to the concepts in that taxonomy. This process will necessitate advances in reward models, which are themselves LLMs, customized to reliably score responses. Second, these reward models are applied to actually improve the LLMs’ responses. Specifically, the project will curate training data exhibiting the correct kinds of behavior for each concept, enabling training of LLMs that do better on those concepts. Finally, the project will develop methods for iteratively improving responses using our reward models. The project will open-source the concept taxonomy and reward models that will outperform closed-source, proprietary models. These models will enable the public to have a better sense of the performance of LLM systems across a variety of applications, and will drive the open-source community to build stronger, more reliable LLM systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

research

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $300K

Deadline

2028-07-31

Complexity

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Research Grants