Keynote Speakers

Jehee Lee

Seoul National University, South Korea

Jehee Lee is a Professor in the Department of Computer Science and Engineering at Seoul National University. His research centers on understanding, simulating, planning, and synthesizing the motion of humans and animals. He is internationally recognized for his pioneering work in modeling and simulating the human musculoskeletal system. He has served in key roles for premier conferences, including Technical Papers Chair for SIGGRAPH Asia 2022 and Test-of-Time Award Chair for ACM SIGGRAPH 2023.

Title: Generative GaitNet and Beyond: Foundational Models for Human Motion Analysis and Simulation

Abstract: Understanding the relationship between human anatomy and motion is fundamental to effective gait analysis, realistic motion simulation, and the creation of human body digital twins. We will begin with Generative GaitNet (SIGGRAPH 2022), a foundational model for human gait that drives a comprehensive full-body musculoskeletal system comprising 304 Hill-type musculotendons. Generative GaitNet is a pre-trained, integrated system of artificial neural networks that operates in a 618-dimensional continuous space defined by anatomical factors (e.g., mass distribution, body proportions, bone deformities, and muscle deficits) and gait parameters (e.g., stride and cadence). Given specific anatomy and gait conditions, the model generates corresponding gait cycles via real-time physics-based simulation. Next, we will discuss Bidirectional GaitNet (SIGGRAPH 2023), which consists of forward and backward models. The forward model predicts the gait pattern of an individual based on their physical characteristics, while the backward model infers physical conditions from observed gait patterns. Finally, we will present MAGNET (Muscle Activation Generation Networks)—another foundational model (SIGGRAPH 2025)—designed to reconstruct full-body muscle activations across a wide range of human motions. We will demonstrate its ability to accurately predict muscle activations from motions captured in video footage. We will conclude by discussing how these foundational models collectively contribute to the development of human body digital twins, and explore their future potential in personalized rehabilitation, surgery planning, and human-centered simulation.

Gerard Pons-Moll

University of Tübingen, Germany

Gerard Pons-Moll is a Professor at the University of Tübingen endowed by the Carl Zeiss Foundation, at the department of Computer Science. He is also core faculty at the Tübingen AI Center, senior researcher at the Max Planck for Informatics (MPII) in Saarbrücken, Germany, and faculty at the IMPRS-IS (International Max Planck Research School - Intelligent Systems in Tübingen). His research lies at the intersection of computer vision, computer graphics and machine learning -- with special focus on analyzing people in videos, and creating virtual human models by "looking" at real ones. His research has produced some of the most advanced statistical human body models of pose, shape, soft-tissue and clothing (which are currently used for a number of applications in industry and research), as well as algorithms to track and reconstruct 3D people models from images, video, depth, and IMUs.

His work has received several awards including the prestigious Emmy Noether Grant (2018), a Google Faculty Research Award (2019,2024), a Facebook Reality Labs Faculty Award (2018,2024), the German Pattern Recognition Award (2019), which is given annually by the German Pattern Recognition Society to one outstanding researcher in the fields of Computer Vision and Machine Learning. His work got Best Papers Awards BMVC’13, Eurographics’17, 3DV’18, 3DV’22 and CVPR’20, ECCV’22 and has been published at the top venues and journals including CVPR, ICCV, Siggraph, Eurographics, 3DV, IJCV and PAMI. He serves regularly as area chair for the major conferences in learning and vision and is associate editor of PAMI.

Title: How to train large scale 3D human and object foundation models

Abstract: Understanding 3D humans interacting with the world has been a long standing goal in AI and computer vision for decades. Lack of 3D data has been the major barrier of progress. This is changing with the increasing number of 3D datasets featuring images, videos and multi-view with 3D annotations, as well as large-scale image foundation models. However, learning models from such sources is non-trivial. Some of the challenges are: 1) Datasets are annotated with different 3D skeleton formats and outputs, 2) image foundation models are 2D and extracting 3D information from them is hard. I will present solutions to each of these 2 challenges. I will introduce a universal training procedure to consume any skeleton format, a diffusion based method tailored to lift foundation models to 3D (human and also general objects), and a mechanism to probe 3D foundation model features in geometry and texture awareness based on 3D Gaussian splatting reconstruction. I will also show a method to systematically create 3D human benchmarks on demand for evaluation (STAGE).

Xubo Yang

Shanghai Jiao Tong University, China

Dr. Xubo Yang is a tenured Full Professor at the School of Software, Shanghai Jiao Tong University, specializing in Virtual/Augmented Reality and Computer Graphics. He heads the Digital ART (Augmented Reality Tech) Laboratory, a leading research hub in next-generation media art computing technologies. Dr. Yang holds a Ph.D. in Computer Science from the State Key Lab of CAD & CG at Zhejiang University, and his academic journey includes research roles at the Fraunhofer Institute for Media Communication in Germany, the Mixed Reality Lab at the National University of Singapore, and as a visiting professor at the University of North Carolina at Chapel Hill.

His research focuses on advancing virtual reality (VR), augmented reality (AR), computer graphics, and novel interactive techniques, with numerous peer-reviewed publications in these fields. Dr. Yang is actively involved in the global VR community, serving as Vice Director of the CCF Technical Committee of Virtual Reality & Visualization and has served as General Chair for prominent XR conferences, including IEEE VR 2023 and EuroXR 2024.

Title: Harmonized XR: Seamlessly Bridging Physical and Perceptual Realism

Abstract: Extended Reality (XR) represents a spectrum of immersive technologies that seamlessly blend the digital and physical worlds, creating environments where users can interact with virtual content as if it were part of their reality This keynote synthesizes cutting-edge research across visual perception, physical simulation, and interactive rendering to explore how XR can achieve both physical realism (accurate representation of physical phenomena) and perceptual realism (alignment with human visual and sensory perception).

We begin by addressing the challenges of visual fidelity in XR through innovative techniques that enhance occlusion, color accuracy, and rendering efficiency, ensuring that virtual content aligns seamlessly with human perception. Next, we delve into advancements in simulation methodologies that bring unprecedented physical accuracy to virtual environments, enabling the realistic representation of complex phenomena such as fluids, bubbles, and surface tension effects. Finally, we explore interactive experiences that bridge the gap between physical and perceptual realism by optimizing virtual interactions to align with natural human behavior and visual focus.

By integrating these advancements, XR can achieve a harmonious balance between physical and perceptual realism, creating immersive environments that are not only computationally efficient but also deeply engaging and believable. This keynote will highlight the interplay between these dimensions, offering a comprehensive roadmap for the future of XR technologies.

Privacy | Accessibility