Motion-X: A Large-scale 3D Expressive
Whole-body Human Motion Dataset

*Equal Contribution, Corresponding Author,
1International Digital Economy Academy (IDEA), 2Tsinghua University 3The Chinese University of Hong Kong, Shenzhen
Motion-X is a large-scale 3D expressive whole-body motion dataset, which comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes, meanwhile providing corresponding semantic labels and pose descriptions.

Figure 1: Different from (a) previous motion dataset, (b) our dataset captures body, facial expressions, and hand gestures. We highlight the comparisons of facial expressions and hand gestures.

Abstract

We propose Motion-X, a large-scale 3D expressive whole-body motion dataset.

Existing motion datasets predominantly contain body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions. Moreover, they are primarily collected from limited laboratory scenes with textual descriptions manually labeled, limiting their scalability. To overcome these limitations, we develop a whole-body motion and text annotation pipeline, which can automatically annotate motion from either single- or multi-view videos and provide comprehensive semantic labels for each video and fine-grained whole-body pose descriptions for each frame. This pipeline is of high precision, cost-effective, and scalable for further research.

Based on it, we construct Motion-X, which comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes. Besides, Motion-X provides 15.6M frame-level whole-body pose descriptions and 81.1K sequence-level semantic labels.

Comprehensive experiments demonstrate the accuracy of the annotation pipeline and the significant benefit of Motion-X in enhancing expressive, diverse, and natural motion generation, as well as the 3D whole-body human mesh recovery task.

Motion-X Dataset

Figure 2: SMPL-X motion samples from the Motion-X dataset.

Figure 3: Overview of Motion-X: (a) diverse facial expressions, (b) indoor motion with expressive face and hand motions, (c) outdoor motion with challenging poses, and (d) several motion sequences.

Visualization of motion annotations from massive online videos

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.

RGB video is from this website.


Figure 4: Illustration of the overall data collection and annotation pipeline.


Figure 5: Illustration of the motion annotation pipleine.



Figure 6: Illustration of (a) whole-body pose description annotation, (b) an example of the text labels.


-->

Examples of the sub-datasets in Motion-X


Figure 7: Motion-X is a superset consisting of eight public datasets and 15K online videos. We annotate all data via our proposed motion and text annotation methods.


Online Videos

Multi-View Datasest (AIST++ and NTU-RGBD120)

Human-scene-interaction Datasets (GRAB and EgoBody)

Action Recognition Datasets (HAA500 and HuMMan)

Body-Only Motion Capture Datasets (AMASS)

License