CAREER: Foundational 4D Human Video Understanding
openNSF
Understanding human behavior from video is a challenging and transformative area of research, with applications in robotics, assistive technologies, neuroscience, and beyond. Humans do not act in isolation; their actions are shaped by their surroundings, interactions with others, and the objects they use. This project aims to develop a new foundational paradigm for understanding humans in 4D — their 3D state over time — from any type of video. Unlike current methods, this approach integrates people with their physical and social context, enabling a deeper understanding of human activities. By creating a computational framework that can analyze both exo-centric (third-person) and ego-centric (first-person) videos, the project addresses the limitations of existing methods and supports downstream applications such as assistive technologies, wearable AI, and data analysis for neuroscience and practical everyday tasks. The resulting advancements will enable robots to learn from observing humans, assistive technologies to better support users, and wearable devices to provide richer context for human activity, contributing to safer, more effective, and accessible technologies with far-reaching impacts across science, industry, and society.
This project will design a scalable, transformer-based model to capture the 4D state of humans and their situational context, including surrounding environments, social interactions, and object use. By leveraging recent advancements in 3D pose estimation, scene reconstruction, and large-scale multimodal models, the research will unify these aspects into a single flexible framework. The approach accommodates various types of video inputs, whether single or multiple views, and will include comprehensive evaluations of its performance. The resulting open-source code, models, and data will provide tools for researchers to advance 4D human understanding and related fields. This project also integrates research with education by developing curricula that combine vision, geometry, and machine learning, and by creating summer research opportunities for a wide range of students, along with accessible online tools to engage a broader audience.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $342K
machine learningEducationsocial science