Research
His research interests are broadly in artificial intelligence, with emphasis on computer vision and machine learning. Much of his research is about 2D image generation, completion and translation, 3D scene reconstruction, generation and completion with the goal of building intelligent machines, capable of rebuilding a photorealistic virtual world via generative AI. In particular, he is working on topics:
3D reconstruction and generation from limited views or videos.
3D editing (decomposition, recomposition, and completion) via object-centric perception.
Generative models for pyhsical world understanding.
Multi-modalities (1D, 2D, 3D, and 4D) generation and understanding.
Your browser does not support the video tag.
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng ,
Andrea Vedaldi
arXiv , 2023
project page /
PDF /
arXiv /
video /
code /
Free3D synthesizes consistent novel views on open-set categories
without the need of explicit 3D representations.
One More Step: A Versatile Plug-and-Play Module for
Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu ,
Jianbin Zheng ,
Chuanxia Zheng ,
Chaoyue Wang ,
Dacheng Tao ,
Tat-Jen Cham
arXiv , 2023
project page /
PDF /
arXiv /
code /
HugeFace
A versatile plug-and-play module to fix the scheduler flaws for diffusion models.
Your browser does not support the video tag.
IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model
Tianhao Wu ,
Chuanxia Zheng ,
Tat-Jen Cham
arXiv , 2023
project page /
PDF /
arXiv /
code
An indoor panorama outpainting model using latent diffusion models with view-consistent.
Your browser does not support the video tag.
Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Yuedong Chen ,
Haofei Xu ,
Qianyi Wu ,
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
arXiv , 2023
project page /
arXiv /
code
Employing explicit correspondence matching as a geometry prior enables NeRF to generalize across scenes.
Your browser does not support the video tag.
Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu ,
Jianbin Zheng ,
Daqing Liu ,
Chuanxia Zheng ,
Chaoyue Wang ,
Dacheng Tao ,
Tat-Jen Cham
NeurIPS , 2023
project page /
PDF /
arXiv /
video /
code
We develop a generalized framework for multi-modality control based on text-to-image generation.
Your browser does not support the video tag.
Online clustered codebook
Chuanxia Zheng ,
Andrea Vedaldi
ICCV , 2023
project page /
PDF /
arXiv /
video /
code /
poster
A simple approach to avoid codebook collapse and achive 100% codebook utilisation.
Vector Quantized Wasserstein Auto-Encoder
Long Tung Vuong ,
Trung Le ,
He zhao ,
Chuanxia Zheng ,
Mehrtash Harandi ,
Jianfei Cai ,
Dinh Phung
ICML , 2023
arXiv /
poster /
code (coming soon)
Minimize the codebook-data distortion as the Wasserstein distance.
Your browser does not support the video tag.
UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu ,
Chuanxia Zheng ,
Heliang Zheng ,
Tat-Jen Cham ,
Chaoyue Wang ,
Zuopeng Yang ,
Dacheng Tao ,
P.N.Suganthan ,
ICLR , 2023
project page /
arXiv /
code /
A unified discrete diffusion model for simultaneous vision-language generation.
Your browser does not support the video tag.
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng ,
Long Tung Vuong ,
Jianfei Cai ,
Dinh Phung
NeurIPS (Spotlight) , 2022
project page /
PDF /
arXiv /
video /
code(Kandinsky2) /
poster
A spatially conditional normalization is introduced to address the repeated artifacts in vector quantized methods.
Your browser does not support the video tag.
Object-Compositional Neural Implicit Surfaces
Qianyi Wu ,
Xian Liu ,
Yuedong Chen ,
Kejie Li ,
Chuanxia Zheng ,
Jianfei Cai ,
Jianmin Zheng
ECCV , 2022
project page /
arXiv /
video /
code
Automatically decompose a scene into 3D instance, trained using only 2D semantic lables and images.
Your browser does not support the video tag.
Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen ,
Qianyi Wu ,
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai ,
ECCV , 2022
project page /
arXiv /
video /
code
We train a 3D inversion model to transfer the 2D semantic map into 3D NeRF, and lets users edit 3D model through 2D semantic input.
Your browser does not support the video tag.
Bridging global context interactions for high-fidelity image completion
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai ,
Dinh Phung
CVPR , 2022
project page /
PDF /
arXiv /
video /
code /
poster
TFill fills in reasonable contents for both foreground object removal and content completion.
Your browser does not support the video tag.
Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
Chuanxia Zheng ,
Duy-Son Dao ,
Guoxian Song ,
Tat-Jen Cham ,
Jianfei Cai ,
IJCV , 2021
project page /
PDF /
arXiv /
video /
code
We build a high-level scene understanding system that simultaneously models the completed shape and appearance for all instances.
Your browser does not support the video tag.
AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning
Guoxian Song ,
Linjie Luo ,
Jing Liu ,
Wan-Chun Ma ,
Chuanxia Zheng ,
Tat-Jen Cham ,
SIGGRAPH , 2021
project page /
PDF /
video /
code /
Online Demo
A GAN inversion model is trained for Stylizing Portraits.
Your browser does not support the video tag.
The Spatially-Correlative Loss for Various Image Translation Tasks
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
CVPR , 2021
project page /
PDF /
arXiv /
video /
code /
poster
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired I2I translation.
Your browser does not support the video tag.
Pluralistic (Free-Form) Image Completion
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
IJCV , 2021
CVPR , 2019
project page /
PDF /
arXiv /
video /
code /
poster
Given a single masked image, the proposed model is able to generate multiple and diverse plausible results.
Your browser does not support the video tag.
T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks
Chuanxia Zheng ,
Tat-Jen Cham ,
Jianfei Cai
ECCV , 2018
project page /
PDF /
arXiv /
video /
code /
poster
Without any real depth map, the proposed model evaluates depth maps on real scenes using only synthetic datasets.
Academic Services
Conference Reviewer
CVPR 2020, 2021, 2022, 2023 (Outstanding Reviewer)
ICCV 2019, 2021, 2023
ECCV 2020, 2022
NeurIPS 2022, 2023
ICLR 2021, 2022, 2023, 2024
ICML 2023
SIGGRAPH&Asia 2022
ICRA 2022
IROS 2022
IJCAI 2022
ACM MM 2021, 2022
Journal Reviewer
TPAMI, IJCV, TIP, JAS, TMM(Outstanding Reviewer Award, 2021), TCSVT, CVIU, TVCJ, NCAA
Teaching
Teaching Assistant, B16: Software Engineering, Undergraduate, Oxford, 2023
Teaching, Generative AI, Graduate, Oxford Summer School, 2023
Teaching Assistant, Advanced Digital Image Processing, Graduate, NTU, 2018-2020
Teaching Assistant, Human-Computer Interaction, Undergraduate, NTU, 2018-2020
Teaching Assistant, Engineering Mathematics, Undergraduate, NTU, 2018-2020