NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.
NSF
Speech technology, including artificial intelligence (AI) trained on speech data, performs poorly in cases where little or no recorded audio data exists to train the required AI models. Building better speech technology in these cases requires creating collections of speech materials and their transcriptions. However, transcription is immensely time-consuming without the assistance of existing AI technologies. This project builds a high-quality speech data set to enable phonetics and phonology research for several low-data languages, and to model an approach to ease the “transcription bottleneck” assisted by techniques in AI and natural language processing (NLP). The project jointly engages the expert perspectives of users of target languages, linguists, and computer scientists, and establishes an infrastructure for collaborative, computationally mediated language work. Other benefits to society include bridging laboratory-style research and real-world applications and providing innovative educational opportunities for trainees. This project builds a 60-hour corpus of naturalistic and read speech data recorded in the field, suitable for both AI/NLP applications and research in acoustic phonetics and phonology. Unsupervised or weakly supervised machine learning techniques are used to semi-automatically transcribe and annotate a portion of the speech corpus. This transcription and annotation process uses a novel human-in-the-loop approach making direct use of expert speaker inputs: transcripts produced for recorded audio by pretrained language models are corrected by trained language experts. These adjusted annotations are incorporated into subsequent rounds of model training and fine-tuning to further increase the accuracy of outputs. The target languages exhibit several unusual phonetic and phonological features that form the basis for exploratory phonetic and phonological research, such as complex lexical tone, stem-initial prominence with unclear acoustic correlates, vowels with consonant-like constriction features, and variable external sandhi processes. The speech corpus, annotations, and language models are available as a starting point for linguistics, NLP, and AI work on related languages with translational impact. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $102K
2027-08-31
Detailed requirements not yet analyzed
Have the NOFO? Paste it below for AI-powered requirement analysis.
One-time $749 fee · Includes AI drafting + templates + PDF export
Research Infrastructure: National Geophysical Facility (NGF): Advancing Earth Science Capabilities through Innovation - EAR Scope
NSF — up to $26.6M
AmLight: The Next Frontier Towards Discovery in the Americas and Africa
NSF — up to $9M
EPSCoR CREST Phase I: Center for Energy Technologies
NSF — up to $7.5M
CREST Phase II Center for Complex Materials Design
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Post-Transcriptional Regulation
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Semiconductors Research
NSF — up to $7.5M