Research
His research interests focus on computer vision and machine learning, especially for creative AI.
He has done a wide range of work on 2D and 3D scene synthesis, with the goal of synthesizing photorealistic physical natural world via generative AI.
In particular, on topics:
3D geometry and appearance synthesis from limited views or videos.
3D editing via object-centric perception.
Generative models for physical understanding and interaction.
Multi-modalities (1D, 2D, 3D, and 4D) generation.
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Brandon Smart ,
Chuanxia Zheng ,
Iro Laina ,
Victor Adrian Prisacariu
arXiv , 2024
project page /
arXiv /
code /
demo
A Feed-Forward model that reconstruct 3D structure and apperance with uncalibrated images.
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li ,
Chuanxia Zheng ,
Christian Rupprecht ,
Andrea Vedaldi
arXiv , 2024
project page /
arXiv /
code /
data /
demo
An interactive video generative model that can serve as a motion prior for part-level dynamics.
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Stanislaw Szymanowicz ,
Eldar Insafutdinov ,
Chuanxia Zheng ,
Dylan Campbell ,
João Henriques ,
Christian Rupprecht ,
Andrea Vedaldi ,
arXiv , 2024
project page /
arXiv /
code /
demo
Flash3D is a fast, super efficient, trinable on a single GPU in one day for scene 3D reconstruction from a single image.
Your browser does not support the video tag.
Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Yuedong Chen ,
Haofei Xu ,
Qianyi Wu ,
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
arXiv , 2023
project page /
arXiv /
code
Employing explicit correspondence matching as a geometry prior enables NeRF to generalize across scenes.
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li ,
Chuanxia Zheng ,
Christian Rupprecht ,
Andrea Vedaldi
ECCV , 2024
project page /
arXiv /
code /
data /
demo
A physical interaction with objects in vision for part-level dragging.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen ,
Haofei Xu ,
Chuanxia Zheng ,
Bohan Zhuang ,
Marc Pollefeys ,
Andreas Geiger ,
Tat-Jen Cham ,
Jianfei Cai
ECCV , 2024
(Oral)
project page /
arXiv /
code
A cost volume representation for efficiently predicting 3D Gaussians from sparse multi-view images in a single forward pass.
ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
Tianhao Wu ,
Chuanxia Zheng ,
Tat-Jen Cham ,
Qianyi Wu
ECCV , 2024
project page /
arXiv /
code
A self-organized 3D segmentation model via neural implicit surface representation.
Your browser does not support the video tag.
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng ,
Andrea Vedaldi
CVPR , 2024
project page /
PDF /
arXiv /
video /
code /
poster
Free3D synthesizes consistent novel views on open-set categories
without the need of explicit 3D representations.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan ,
Chuanxia Zheng ,
Weidi Xie ,
Andrew Zisserman
CVPR , 2024
project page /
PDF /
arXiv /
code
Setting up a Stable Diffusion based network to
solve the amodal completion problem for any category and without occluder mask provided.
One More Step: A Versatile Plug-and-Play Module for
Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu ,
Jianbin Zheng ,
Chuanxia Zheng ,
Chaoyue Wang ,
Dacheng Tao ,
Tat-Jen Cham
CVPR , 2024
project page /
PDF /
arXiv /
code /
demo
A versatile plug-and-play module to fix the scheduler flaws for diffusion models.
Your browser does not support the video tag.
PanoDiffusion: 360-degree Panorama Outpainting via Diffusion
Tianhao Wu ,
Chuanxia Zheng ,
Tat-Jen Cham
ICLR , 2024
project page /
PDF /
arXiv /
code
An indoor panorama outpainting model using latent diffusion models with view-consistent.
Your browser does not support the video tag.
Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion
Chuanxia Zheng ,
Guoxian Song ,
Tat-Jen Cham ,
Jianfei Cai ,
Linjie Luo ,
Dinh Phung
T-PAMI , 2024
project page /
PDF /
ieeexplore /
video /
code
PICFormer achieves pluralistic image completion with multiple and diverse
solutions using a transformer based architecture at a much faster inference speed.
Your browser does not support the video tag.
Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu ,
Jianbin Zheng ,
Daqing Liu ,
Chuanxia Zheng ,
Chaoyue Wang ,
Dacheng Tao ,
Tat-Jen Cham
NeurIPS , 2023
project page /
PDF /
arXiv /
video /
code
We develop a generalized framework for multi-modality control based on text-to-image generation.
Your browser does not support the video tag.
Online clustered codebook
Chuanxia Zheng ,
Andrea Vedaldi
ICCV , 2023
project page /
PDF /
arXiv /
video /
code /
poster
A simple approach to avoid codebook collapse and achive 100% codebook utilisation.
Vector Quantized Wasserstein Auto-Encoder
Long Tung Vuong ,
Trung Le ,
He zhao ,
Chuanxia Zheng ,
Mehrtash Harandi ,
Jianfei Cai ,
Dinh Phung
ICML , 2023
arXiv /
poster /
Minimize the codebook-data distortion as the Wasserstein distance.
Your browser does not support the video tag.
UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu ,
Chuanxia Zheng ,
Heliang Zheng ,
Tat-Jen Cham ,
Chaoyue Wang ,
Zuopeng Yang ,
Dacheng Tao ,
P.N.Suganthan ,
ICLR , 2023
project page /
arXiv /
code
A unified discrete diffusion model for simultaneous vision-language generation.
Your browser does not support the video tag.
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng ,
Long Tung Vuong ,
Jianfei Cai ,
Dinh Phung
NeurIPS , 2022
(Spotlight)
project page /
PDF /
arXiv /
video /
code(Kandinsky2) /
poster
A spatially conditional normalization is introduced to address the repeated artifacts in vector quantized methods.
Your browser does not support the video tag.
Object-Compositional Neural Implicit Surfaces
Qianyi Wu ,
Xian Liu ,
Yuedong Chen ,
Kejie Li ,
Chuanxia Zheng ,
Jianfei Cai ,
Jianmin Zheng
ECCV , 2022
project page /
arXiv /
video /
code
Automatically decompose a scene into 3D instance, trained using only 2D semantic lables and images.
Your browser does not support the video tag.
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen ,
Qianyi Wu ,
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai ,
ECCV , 2022
project page /
arXiv /
video /
code
A 3D inversion model that transfers the 2D semantic map into 3D NeRF, and lets users edit 3D model through 2D semantic input.
Your browser does not support the video tag.
Bridging global context interactions for high-fidelity image completion
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai ,
Dinh Phung
CVPR , 2022
project page /
PDF /
arXiv /
video /
code /
poster
TFill fills in reasonable contents for both foreground object removal and content completion.
Your browser does not support the video tag.
Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
Chuanxia Zheng ,
Duy-Son Dao ,
Guoxian Song ,
Tat-Jen Cham ,
Jianfei Cai ,
IJCV , 2021
project page /
PDF /
arXiv /
video /
code
A high-level scene understanding system that simultaneously models the completed shape and appearance for all instances.
Your browser does not support the video tag.
AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning
Guoxian Song ,
Linjie Luo ,
Jing Liu ,
Wan-Chun Ma ,
Chuanxia Zheng ,
Tat-Jen Cham ,
SIGGRAPH , 2021
project page /
PDF /
video /
code /
Online Demo
A GAN inversion model is trained for Stylizing Portraits.
Your browser does not support the video tag.
The Spatially-Correlative Loss for Various Image Translation Tasks
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
CVPR , 2021
project page /
PDF /
arXiv /
video /
code /
poster
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired I2I translation.
Your browser does not support the video tag.
Pluralistic (Free-Form) Image Completion
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
IJCV , 2021
CVPR , 2019
project page /
PDF /
arXiv /
video /
code /
poster
Given a single masked image, the proposed model is able to generate multiple and diverse plausible results.
Your browser does not support the video tag.
T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
ECCV , 2018
project page /
PDF /
arXiv /
video /
code /
poster
Without any real depth map, the proposed model evaluates depth maps on real scenes using only synthetic datasets.
Academic Services
Area Chair
Conference Reviewer
CVPR 2020, 2021, 2022, 2023 (Outstanding Reviewer), 2024
ICCV 2019, 2021, 2023
ECCV 2020, 2022, 2024
NeurIPS 2022, 2023, 2024
ICLR 2021, 2022, 2023, 2024, 2025
ICML 2023
SIGGRAPH&Asia 2022
Journal Reviewer
TPAMI, IJCV, TIP, TMM(Outstanding Reviewer Award, 2021), TCSVT, CVIU, TVCJ
Teaching
Teaching Assistant, B16: Software Engineering, Undergraduate, Oxford, 2023
Teaching, Generative AI, Graduate, Oxford Summer School, 2023
Teaching Assistant, Advanced Digital Image Processing, Graduate, NTU, 2018-2020
Teaching Assistant, Human-Computer Interaction, Undergraduate, NTU, 2018-2020
Teaching Assistant, Engineering Mathematics, Undergraduate, NTU, 2018-2020