Chuanxia Zheng

Chuanxia Zheng is a Marie Skłodowska-Curie Actions (MSCA) Fellow in VGG at the University of Oxford, working with Prof. Andrea Vedaldi on feed-forward 3D reconstruction and generation.

Before that, he spent one year at Monash University, where he worked as a Research Fellow with Prof. Jianfei Cai and Prof. Dinh Phung on codebook learning for Generative AI. He received his PhD degree from the SCSE at Nanyang Technological University, supervised by Prof. Tat-Jen Cham and Prof. Jianfei Cai on 2D generation, translation and completion. His thesis Synthesizing Photorealistic Images was awarded the NTU Outstanding PhD Thesis Award 2022.

Email  /  CV  /  Google Scholar  /  Github  /  Linkedin  /  Twitter

profile photo

His research interests focus on computer vision and machine learning, especially for creative AI. He has done a wide range of work on 2D and 3D scene synthesis, with the goal of synthesizing photorealistic physical natural world via generative AI. In particular, on topics:

  • 3D geometry and appearance synthesis from limited views or videos.
  • 3D editing via object-centric perception.
  • Generative models for physical understanding and interaction.
  • Multi-modalities (1D, 2D, 3D, and 4D) generation.

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu
arXiv, 2024
project page / arXiv / code / demo

A Feed-Forward model that reconstruct 3D structure and apperance with uncalibrated images.

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi
arXiv, 2024
project page / arXiv / code / data / demo

An interactive video generative model that can serve as a motion prior for part-level dynamics.

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João Henriques, Christian Rupprecht, Andrea Vedaldi,
arXiv, 2024
project page / arXiv / code / demo

Flash3D is a fast, super efficient, trinable on a single GPU in one day for scene 3D reconstruction from a single image.

Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
arXiv, 2023
project page / arXiv / code

Employing explicit correspondence matching as a geometry prior enables NeRF to generalize across scenes.

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi
ECCV, 2024
project page / arXiv / code / data / demo

A physical interaction with objects in vision for part-level dragging.

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
ECCV, 2024   (Oral)
project page / arXiv / code

A cost volume representation for efficiently predicting 3D Gaussians from sparse multi-view images in a single forward pass.

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu
ECCV, 2024
project page / arXiv / code

A self-organized 3D segmentation model via neural implicit surface representation.

Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng, Andrea Vedaldi
CVPR, 2024
project page / PDF / arXiv / video / code / poster

Free3D synthesizes consistent novel views on open-set categories without the need of explicit 3D representations.

Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman
CVPR, 2024
project page / PDF / arXiv / code

Setting up a Stable Diffusion based network to solve the amodal completion problem for any category and without occluder mask provided.

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham
CVPR, 2024
project page / PDF / arXiv / code / demo

A versatile plug-and-play module to fix the scheduler flaws for diffusion models.

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion
Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham
ICLR, 2024
project page / PDF / arXiv / code

An indoor panorama outpainting model using latent diffusion models with view-consistent.

Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion
Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Linjie Luo, Dinh Phung
T-PAMI, 2024
project page / PDF / ieeexplore / video / code

PICFormer achieves pluralistic image completion with multiple and diverse solutions using a transformer based architecture at a much faster inference speed.

Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham
NeurIPS, 2023
project page / PDF / arXiv / video / code

We develop a generalized framework for multi-modality control based on text-to-image generation.

Online clustered codebook
Chuanxia Zheng, Andrea Vedaldi
ICCV, 2023
project page / PDF / arXiv / video / code / poster

A simple approach to avoid codebook collapse and achive 100% codebook utilisation.

Vector Quantized Wasserstein Auto-Encoder
Long Tung Vuong, Trung Le, He zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung
ICML, 2023
arXiv / poster /

Minimize the codebook-data distortion as the Wasserstein distance.

UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, P.N.Suganthan,
ICLR, 2023
project page / arXiv / code

A unified discrete diffusion model for simultaneous vision-language generation.

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung
NeurIPS, 2022  (Spotlight)
project page / PDF / arXiv / video / code(Kandinsky2) / poster

A spatially conditional normalization is introduced to address the repeated artifacts in vector quantized methods.

Object-Compositional Neural Implicit Surfaces
Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng
ECCV, 2022
project page / arXiv / video / code

Automatically decompose a scene into 3D instance, trained using only 2D semantic lables and images.

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai,
ECCV, 2022
project page / arXiv / video / code

A 3D inversion model that transfers the 2D semantic map into 3D NeRF, and lets users edit 3D model through 2D semantic input.

Bridging global context interactions for high-fidelity image completion
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung
CVPR, 2022
project page / PDF / arXiv / video / code / poster

TFill fills in reasonable contents for both foreground object removal and content completion.

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai,
IJCV, 2021
project page / PDF / arXiv / video / code

A high-level scene understanding system that simultaneously models the completed shape and appearance for all instances.

AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning
Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chuanxia Zheng, Tat-Jen Cham,
project page / PDF / video / code / Online Demo

A GAN inversion model is trained for Stylizing Portraits.

The Spatially-Correlative Loss for Various Image Translation Tasks
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
CVPR, 2021
project page / PDF / arXiv / video / code / poster

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired I2I translation.

Pluralistic (Free-Form) Image Completion
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
IJCV, 2021
CVPR, 2019
project page / PDF / arXiv / video / code / poster

Given a single masked image, the proposed model is able to generate multiple and diverse plausible results.

T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
ECCV, 2018
project page / PDF / arXiv / video / code / poster

Without any real depth map, the proposed model evaluates depth maps on real scenes using only synthetic datasets.

Academic Services

Area Chair

BMVC    2024
ACM MM    2024

Conference Reviewer

CVPR    2020, 2021, 2022, 2023 (Outstanding Reviewer), 2024
ICCV    2019, 2021, 2023
ECCV    2020, 2022, 2024
NeurIPS    2022, 2023, 2024
ICLR    2021, 2022, 2023, 2024, 2025
ICML    2023
SIGGRAPH&Asia    2022

Journal Reviewer

TPAMI, IJCV, TIP, TMM(Outstanding Reviewer Award, 2021), TCSVT, CVIU, TVCJ

  • Teaching Assistant, B16: Software Engineering, Undergraduate, Oxford, 2023
  • Teaching, Generative AI, Graduate, Oxford Summer School, 2023
  • Teaching Assistant, Advanced Digital Image Processing, Graduate, NTU, 2018-2020
  • Teaching Assistant, Human-Computer Interaction, Undergraduate, NTU, 2018-2020
  • Teaching Assistant, Engineering Mathematics, Undergraduate, NTU, 2018-2020

awesome website template
Last updated Oct. 2024.