Hi!

Intro [CV]

I completed my M.Sc. at the University of California, Irvine and my B.Eng. at Xidian University. I am fortunate to have worked at CVMI Lab under the guidance of Prof. Xiaojuan Qi and at SIAT-MMLab under the supervision of Prof. Yu Qiao . During that time, my research focuses on 3D point cloud analysis:
a) GDANet (AAAI 2021): A representative work of robust 3D shape recognition.
b) PAConv (CVPR 2021): A representative work of generic point cloud representation.

Currently, I am a Ph.D. candidate at GAP Lab, advised by Prof. Xiaoguang Han, at the Chinese University of Hong Kong (Shenzhen) . Throughout my Ph.D. studies, my research primarily revolved around a central question:
Can we break the barrier of the painstaking real-world 3D data acquisition to train intelligent algorithms/agents that can perceive, model, represent, and interact with 3D objects/scenes in the real world?
I attempt to tackle this challenge from two different perspectives:

(1) Up-stream Data

i) Data Collection => simplify/eliminate the real-world data collection:
a) TO-Scene (ECCV2022): Combine synthetic with real-world data to avoid scannning tabletop objects.
b) MVImgNet (CVPR2023): Use multi-view videos that is easier to capture to represent the real 3D world.
c) VC-Agent: Extract from Internet videos for customized video dataset collection.
ii) Data Generation/Simulation => generate/simulate real-world 3D data:
Stable-Sim2Real: Use diffusion model to simulate real-world 3D captures given sythetic input.

(2) Downstream Algorithms

i) Label-Efficient => learning without 3D real-world label:
MM-3DScene (CVPR 2023): Apply masked modeling to self-supervised pretraining on 3D scenes.
ii) Data-Efficient => learning without 3D real-world data:
SAMPro3D (3DV 2025): Employ 2D SAM for zero-shot 3D scene segmentation without additional training.

I have also led/participated in various projects related to generative models.

a) Free-ATM (ECCV 2024, as project lead): Use the diffusion model for representation learning.
b) TASTE-Rob (CVPR 2025, as project lead): Hand-object interaction video generation.
c) RichDreamer (CVPR 2024): Text-to-3D generation.

I am seeking a Postdoctoral or Faculty position for the Summer/Fall of 2025. If you have any openings or are interested, please feel free to contact me.

News

  • [03/2025] One paper (TASTE-Rob) is accepted to CVPR2025. The code and data will be available.
  • [11/2024] One paper (SAMPro3D) is accepted to 3DV2025. Paper and code are available.
  • [08/2024] One paper (survey) is accepted to TPAMI. Paper is available.
  • [07/2024] One paper (Free-ATM) is accepted to ECCV2024. The updated paper and code will be available.
  • [07/2024] MVImgNet is awarded WAIC Youth Outstanding Paper Nomination, 2024 (2024世界人工智能大会青年优秀论文提名奖).
  • [03/2024] One paper (RichDreamer) is accepted to CVPR2024 as Highlight (2.8%). Paper and code are available.
  • [11/2023] I was recognized as one of the NeurIPS2023 Top Reviewers (9.9%).
  • [08/2023] MVImgNet is awared CCF Outstanding Graphics Open-Source Dataset, 2023 (CCF-2023年度优秀图形开源数据集奖).
  • [05/2023] I was named one of the CVPR2023 Outstanding Reviewers (3.3%).
  • [03/2023] Three papers (MVImgNet, MM-3DScene, REC-MV) are accepted to CVPR2023. Papers, dataset, and codes are all available.
  • [07/2022] One paper (TO-Scene) is accepted to ECCV2022 for Oral Presentation (2.7%). Paper and dataset are available.
  • [03/2021] One paper (PAConv) is accepted to CVPR2021. Paper and code are available.
  • [12/2020] One paper (GDANet) is accepted to AAAI2021. Paper and code are available.

Publications

(* indicates equal contribution, denotes project lead, means corresponding author)

Selected Representatives

MVImgNet: A Large-scale Dataset of Multi-view Images

Xianggang Yu*, Mutian Xu *, Yidan Zhang*, Haolin Liu*, Chongjie Ye*, et al., Guanying Chen, Shuguang Cui, Xiaoguang Han. (part of project lead)
(CVPR, 2023) [project][paper][code]

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

Mutian Xu*, Runyu Ding*, Hengshuang Zhao, Xiaojuan Qi.
(CVPR, 2021) [paper][code]

TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

Mutian Xu*, Pei Chen*, Haolin Liu, Xiaoguang Han.
(ECCV, 2022, Oral, 2.7%) [paper][code]

SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation

Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, Xiaoguang Han.
(3DV, 2025) [project][paper][code]

Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud

Mutian Xu*, David Junhao Zhang*, Zhipeng Zhou, Mingye Xu, Xiaojuan Qi, Yu Qiao.
(AAAI, 2021, BEST performance on OmniObject 3D robust perception) [paper][code]

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

Mingye Xu*, Mutian Xu*, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao.
(CVPR, 2023) [project][paper][code]

Simulation of Real-Captured 3D Data via Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu, Yushuang Wu, Jiahao Chang, Xiaoguang Han.
(Under submission, 2024)

An Agent for Video Data Collection

Yidan Zhang *, Mutian Xu*, Yiming Hao, Kun Zhou, Jiahao Chang, Xiaoguang Han.
(Under submission, 2024)

Others

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao *, Xingchen Liu *, Mutian Xu, Yiming Hao, Weikai Chen, Xiaoguang Han .
(CVPR, 2025) [project][paper]

Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images

David Junhao Zhang, Mutian Xu, Jay Zhangjie Wu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou. (project lead)
(ECCV, 2024) [paper]

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Lingteng Qiu*, Guanying Chen*, Xiaodong Gu*, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han.
(CVPR, 2024, Highlight, 2.8%) [project][paper][code]

REC-MV: REconstructing 3D Dynamic Cloth from Monucular Videos

Lingteng Qiu*, Guanying Chen*, Jiapeng Zhou, Mutian Xu, Junle Wang, Xiaoguang Han.
(CVPR, 2023) [project][paper][code]

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Chaoqi Chen*, Yushuang Wu*, Qiyuan Dai*, Hong-Yu Zhou*, Mutian Xu, Sibei Yang, Xiaoguang Han, Yizhou Yu.
(TPAMI, 2024) [paper]

Activites & Certificates

WAIC Youth Outstanding Paper Nomination Award (世界人工智能大会青年优秀论文提名奖, MVImgNet), 2024
Top Reviewer, NeurIPS 2023 (9.9%)
Outstanding Reviewer, CVPR 2023 (3.3%)
CCF Outstanding Graphics Open-Source Dataset (CCF年度优秀图形开源数据集奖, MVImgNet), 2023
Outstanding Teaching Assistant Award of CUHKSZ, 2022/24
Journal Reviewer: TIP, IJCV, TVCG, NEUCOM, TMM, MVAP
Conference Reviewer: CVPR 23/24/25, ICCV 21/23/25, ECCV 24, ICLR 24/25, ICML 24/25, NeurIPS 23/25, IJCAI 24, WACV 24/25, ACCV 24

Experience

Talks

"我要这paper有何用 ? ( Why Do I Need Papers ? )", Valse Webinar 2024
Outstanding Student Forum of Valse 2023
Outstanding Student Forum of China 3DV 2023
Youth PhD Talk - ECCV 2022, invited by AI-TIME

Teaching

CUHKSZ-CSC1001: Introduction to Computer Science: Programming Methodology (Leading TA)
CUHKSZ-CSC1002: Computational Laboratory
CUHKSZ-CSC3002: Introduction to Computer Science: Programming Paradigms

Miscellaneous

3rd place of the 31st School Singer Contest, Xidian University
Piano Professional Certificate Level 10