Synthetic Data for Computer Vision @ CVPR 2024

Overview

The workshop aims to explore the use of synthetic data in training and evaluating computer vision models, as well as in other related domains. During the last decade, advancements in computer vision were catalyzed by the release of painstakingly curated human-labeled datasets. Recently, people have increasingly resorted to synthetic data as an alternative to labor-intensive human-labeled datasets for its scalability, customizability, and cost-effectiveness. Synthetic data offers the potential to generate large volumes of diverse and high-quality vision data, tailored to specific scenarios and edge cases that are hard to capture in real-world data. However, challenges such as the domain gap between synthetic and real-world data, potential biases in synthetic generation, and ensuring the generalizability of models trained on synthetic data remain. We hope the workshop can provide a forum to discuss and encourage further exploration in these areas.

Invited Speakers

Ani Kembhavi

Allen Institute for AI (AI2)

Jia Deng

Princeton University

Ludwig Schmidt

University of Washington

Ming Lin

University of Maryland

Ruslan Salakhutdinov

Carnegie Mellon University

Yale Song

FAIR, Meta AI

Yannis Kalantidis

NAVER LABS Europe

Schedule

Date: June 18, 2024 · Full day Talks: Summit 423-425 Posters: Arch Building Exhibit Hall

09:00 – 09:10 Opening Opening
09:10 – 09:50 Talk by Ludwig Schmidt Talk
09:50 – 10:30 Talk by Ruslan Salakhutdinov Talk
10:30 – 10:50 Break Break
10:50 – 11:30 Talk by Yale Song Talk
11:30 – 12:10 Talk by Jia Deng Talk
12:10 – 13:30 Lunch Break
13:30 – 14:30 Poster Session Poster
14:30 – 15:10 Talk by Ani Kembhavi Talk
15:10 – 15:50 Talk by Ming Lin Talk
15:50 – 16:10 Break Break
16:10 – 16:50 Talk by Yannis Kalantidis Talk
16:50 – 17:05 Oral · CinePile: A Long Video Question Answering Dataset and Benchmark Oral
17:05 – 17:20 Oral · GenAI-Bench: A Holistic Benchmark for Compositional Text-to-Visual Generation Oral
17:20 – 17:30 Closing Closing

Poster Session

Time

June 18 · 1:30 – 2:30 PM
Location

Arch Building Exhibit Hall
Poster Numbers

#300 – #349

Notice: the poster session location is different from the talk venue.

Awards

Best Long Paper

CinePile: A Long Video Question Answering Dataset and Benchmark

Ruchit Rawal, Khalid Saifullah, Ronen Basri, David Jacobs, Gowthami Somepalli, Tom Goldstein

Long Paper Honorable Mention

A Benchmark Synthetic Dataset for C-SLAM in Service Environments

Harin Park, Inha Lee, Minje Kim, Hyungyu Park, Kyungdon Joo

Best Short Paper

GenAI-Bench: A Holistic Benchmark for Compositional Text-to-Visual Generation

Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Emily Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

Short Paper Honorable Mention

R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang

Accepted Papers · 42 papers

A Benchmark Synthetic Dataset for C-SLAM in Service Environments

Harin Park, Inha Lee, Minje Kim, Hyungyu Park, Kyungdon Joo
A Neural Model for High-Performance Scanning Electron Microscopy Image Simulation of Porous Materials

Tim Dahmen, Markus Kronenberger, Niklas Rottmayer, Katja Schladitz, Claudia Redenbach
An Approach to Synthesize Thermal Infrared Ship Images

Doan Thinh Vo, Phan Anh Đức, Nguyen Nhu Thao, Huong Ninh
Attributed Synthetic Data Generation for Zero-shot Image Classification

Shijian Wang, Linxin Song, Ryotaro Shimizu, Masayuki Goto, Hanqian wu
Balancing Quality and Quantity: The Impact of Synthetic Data on Smoke Detection Accuracy in Computer Vision

Ethan Seefried, Changsoo Jung, Jack Fitzgerald, Mariah Bradford, Trevor Chartier, Nathaniel Blanchard
Beyond Internet Images: Evaluating Vision-Language Models for Domain Generalization on Synthetic-to-Real Industrial Datasets

Louis Hémadou, Héléna Vorobieva, Ewa Kijak, Frederic Jurie
CinePile: A Long Video Question Answering Dataset and Benchmark

Ruchit Rawal, Khalid Saifullah, Ronen Basri, David Jacobs, Gowthami Somepalli, Tom Goldstein
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun
Compositional Learning of Visually-Grounded Concepts Using Reinforcement

Zijun Lin, Haidi Azaman, M Ganesh Kumar, Cheston Tan
DDOS: The Drone Depth and Obstacle Segmentation Dataset

Benedikt Kolbeinsson, Krystian Mikolajczyk
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov
DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Donggeun Ko, Sangwoo Jo, Dongjun Lee, Namjun Park, Jaekwang KIM
DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading

Man M. Ho, Elham Ghelichkhan, Yosep Chong, Yufei Zhou, Beatrice S. Knudsen, Tolga Tasdizen
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
From NeRF to 3DGS: A Leap in Stereo Dataset Quality?

Magnus Kaufmann Gjerde, Filip Slezák, Joakim Bruslund Haurum, Thomas B. Moeslund
GenAI-Bench: A Holistic Benchmark for Compositional Text-to-Visual Generation

Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Emily Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Mehran Kazemi, Hamidreza Alvari, Ankit Anand, Jialin Wu, Xi Chen, Radu Soricut
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension

Luca Parolari, Elena Izzo, Lamberto Ballan
HDL-SAM: A Hybrid Deep Learning Framework for High-Resolution Imaging in Scanning Acoustic Microscopy

Akshit Sharma, Ayush Somani, Pragyan Banerjee, Frank Melandsø, Anowarul Habib
Implicit Neural Clustering

Thomas Kreutz, Max Mühlhäuser, Alejandro Sanchez Guinea
Inclusive Portrait Lighting Estimation Model Leveraging Graphic-Based Synthetic Data

Kin Ching Lydia Chau, Tao LI, Ruowei Jiang, Zhi Yu, Panagiotis-Alexandros Bokaris
Intrinsic LoRA: A Generalist Approach for Discovering Knowledge in Generative Models

Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad
LAESI: Leaf Area Estimation with Synthetic Imagery

Jacek Kałużny, Yannik Schreckenberg, Karol Cyganik, Peter Annighöfer, Soren Pirk, Dominik Michels, Mikolaj Cieslak, Farhah Assaad, Bedrich Benes, Wojtek Palubicki
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation

Linyan Yang, Lukas Hoyer, Mark Weber, Tobias Fischer, Dengxin Dai, Laura Leal-Taixé, Daniel Cremers, Marc Pollefeys, Luc Van Gool
Object-Conditioned Energy-Based Model for Attention Map Alignment in Text-to-Image Diffusion Models

Yasi Zhang, Peiyu Yu, Ying Nian Wu
On the Equivalency, Substitutability, and Flexibility of Synthetic Data

Che-Jui Chang, Danrui Li, Seonghyeon Moon, Mubbasir Kapadia
Paved2Paradise: Cost-Effective and Scalable LiDAR Simulation by Factoring the Real World

Michael A. Alcorn, Noah Schwartz
R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang
S2MGen: A Synthetic Skin Mask Generator for Improving Segmentation

Subhadra Gopalakrishnan, Trisha Mittal, Jaclyn Pytlarz, Yuheng Zhao
Self-Distillation on Conditional Spatial Activation Maps for ForeGround-BackGround Segmentation

Yeruru Asrar Ahmed, Anurag Mittal
SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
SIFTer: Self-improving Synthetic Datasets for Pre-training Classification Models

Ryo Hayamizu, Shota Nakamura, Sora Takashima, Hirokatsu Kataoka, Ikuro Sato, Nakamasa Inoue, Rio Yokota
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?

Hasan Abed Al Kader Hammoud, Hani Itani, Fabio Pizzati, Adel Bibi, Bernard Ghanem
Training Robust Classifiers with Diffusion Denoised Examples

Chandramouli Shama Sastry, Sri Harsha Dumpala, Sageev Oore
Training with Real instead of Synthetic Generated Images Still Performs Better

Scott Geng, Ranjay Krishna, Pang Wei Koh
Uncertainty Inclusive Contrastive Learning for Leveraging Synthetic Images

Fiona Cai, Emily Mu, John Guttag
UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video

Zhi-Hao Lin, Bohan Liu, Yi-Ting Chen, David Forsyth, Jia-Bin Huang, Anand Bhattad, Shenlong Wang
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang
Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?

Dmitry Yu. Ignatov, Andrey Ignatov, Radu Timofte
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman
XIMAGENET-12: An Explainable Visual Benchmark Dataset for Model Robustness Evaluation

Qiang Li, Dan Zhang, Shengzhao Lei, Xun Zhao, WeiWei Li, Porawit Kamnoedboon, Junhao Dong, Shuyan Li

Call for Papers

We invite papers on the use of synthetic data for training and evaluating computer vision models. We welcomed submissions along two tracks:

Full papers: Up to 8 pages, not including references/appendix.
Short papers: Up to 4 pages, not including references/appendix.

Accepted papers were allocated a poster presentation and displayed on the workshop website. In addition, we offered a Best Long Paper award, Best Paper Runner-up award, and Best Short Paper with oral presentation. Topics included, but were not limited to:

Effectiveness: What is the most effective way to generate and leverage synthetic data? How "realistic" does synthetic data need to be?
Efficiency and scalability: Can we make synthetic data generation more efficient and scalable without sacrificing quality?
Benchmark and evaluation: What benchmark and evaluation methods are needed to assess the efficacy of synthetic data for computer vision?
Risks and ethical considerations: What ethical questions and risks are associated with synthetic data (e.g. bias amplification), and how can we address them?
Applications: In addition to existing attempts on leveraging synthetic data for training visual recognition and vision-language models, what are other tasks in computer vision or other related fields (e.g., robotics, NLP) that could benefit from synthetic data?
Other open problems: How do we decide which type of data to use, synthetic or real-world data? What is the optimal way to combine both if both are available?