Date and room TBA
Recent advances in generative modeling and semantic understanding have spurred significant interest in synthesis and understanding of 3D scenes. In 3D, there is significant potential in application areas, for instance augmented and virtual reality, computational photography, interior design, and autonomous mobile robots all require a deep understanding of 3D scene spaces.
We propose to offer the first benchmark challenge for novel view synthesis in large-scale 3D scenes, along with high-fidelity, large-vocabulary 3D semantic scene understanding -- where very complete, high-fidelity ground truth scene data is available. This is enabled through the new ScanNet++ dataset, which offers 1mm resolution laser scan geometry, high-quality DSLR image capture, and dense semantic annotations over 1000 class categories. In particular, existing view synthesis leverages data captured from a single continuous trajectory, where evaluation of novel views outside of the original trajectory capture is impossible. In contrast, our novel view synthesis challenge leverages test images captured intentionally outside of the train image trajectory, allowing for comprehensive evaluation of methods to test new, challenging scenarios for state-of-the-art methods.
Please download the dataset here and submit your result before May 22 to be considered for the challenge.
📢 New this year 📢 iPhone NVS Benchmark: train on commodity-level captures and test against high-quality DSLR images, and 360 degree RGB-D panoramic images. Check it out!| Welcome and Introduction | ||
| Invited Talk 1: | Speaker 1 | |
| Invited Talk 2: | Speaker 2 | |
| Winner Talk: | Winner talk 1 Winner talk 2 Winner talk 3 | |
| Invited Talk 3: | Speaker 3 | |
| Invited Talk 4: | Speaker 4 | |
| Invited Talk 5: | Speaker 5 | |
| Panel Discussion and Conclusion | ||
Lourdes Agapito holds the position of Professor of 3D Vision at the Department of Computer Science, University College London (UCL). Her research in computer vision has consistently focused on the inference of 3D information from single images or videos acquired from a moving camera. She received her BSc, MSc and PhD degrees from the Universidad Complutense de Madrid (Spain). In 1997 she joined the Robotics Research Group at the University of Oxford as an EU Marie Curie Postdoctoral Fellow. In 2001 she was appointed Lecturer at the Department of Computer Science at Queen Mary University of London. From 2008 to 2014 she held an ERC Starting Grant funded by the European Research Council to focus on theoretical and practical aspects of deformable 3D reconstruction from monocular sequences. In 2013 she joined the Department of Computer Science at University College London and was promoted to full professor in 2015. Lourdes Program Chair for CVPR 2016 and ICCV 2023 and serves regularly as Area Chair for the top Computer Vision conferences (CVPR, ICCV, ECCV). She was keynote speaker at ICRA 2017, ICLR 2021, 3DV 2022 and ECCV2024. In 2017 she co-founded Synthesia, the London based generative AI startup responsible for the technology behind the Malaria no More video campaign that saw David Beckham speak 9 different languages to call on world leaders to take action to defeat Malaria.
Yiyi Liao is an assistant professor at Zhejiang University, leading the X-Dimensional Representations Lab (X-D Lab). Before that, she was a Postdoc in Autonomous Vision Group at the University of Tubingen and the MPI for Intelligent Systems, working with Prof. Andreas Geiger. She received her Ph.D. in Control Science and Engineering from Zhejiang University in June 2018 and the B.S. degree from Xi'an Jiaotong University in 2013. Her research interest lies in 3D computer vision, including scene understanding, 3D reconstruction and 3D generative models.
David Novotny is a Computer Vision Researcher specializing in 3D computer vision. Previously he was a Research Scientist at Meta AI Research London, and a DPhil student at the Visual Geometry Group, University of Oxford funded by Naver Labs Europe. His supervisors were Diane Larlus (Naver Labs Europe) and Andrea Vedaldi (University of Oxford). His research is focused around deep learning for 3D reconstruction (creating virtual twins) and 3D generative AI.
Andrea Tagliasacchi is an associate professor at Simon Fraser University (Vancouver, Canada) where he holds the appointment of Visual Computing Research Chair within the school of computing science. He is also a part-time (20%) staff research scientist at Google DeepMind (Toronto, Canada), as well as an associate professor (status only) in the computer science department at the University of Toronto. Before joining SFU, he spent four wonderful years as a full-time researcher at Google (mentored by Paul Lalonde, Geoffrey Hinton, and David Fleet). Before joining Google, he was an assistant professor at the University of Victoria (2015-2017), where he held the Industrial Research Chair in 3D Sensing (jointly sponsored by Google and Intel). His alma mater include EPFL (postdoc), SFU (PhD, NSERC Alexander Graham Bell fellow) and Politecnico di Milano (MSc, gold medalist). Several of his papers have received best-paper award nominations at top-tier graphics and vision conferences, and he is the recipient of the 2015 SGP best paper award, the 2020 CVPR best student paper award, and the 2024 CVPR best paper award (honorable mention). His research focuses on 3D visual perception, which lies at the intersection of computer vision, computer graphics and machine learning.