CelebV-HQ: A Large-scale Video Facial Attributes Dataset

ECCV 2022


Hao Zhu1*, Wayne Wu1*† , Wentao Zhu2, Liming Jiang3,
Siwei Tang1, Li Zhang1, Ziwei Liu3, Chen Change Loy3
(*Equal contribution)
1SenseTime Research, 2Peking University, 3S-Lab, Nanyang Technological University

Abstract

Large-scale datasets played an indispensable role in the recent success of face generation/editing and significantly facilitate the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for face-related video research. We propose a large-scale, high-quality, and diverse video dataset, named the High-Quality Celebrity Video Dataset (CelebV-HQ), with rich facial attribute annotations.

CelebV-HQ contains 35,666 video clips involving 15,653 identities and 83 manually labeled facial attributes covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of ethnicity, age, brightness, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on unconditional video generation and video facial attribute editing tasks.

For more details of the dataset, please refer to the paper "CelebV-HQ: A Large-scale Video Facial Attributes Dataset".

Overview Video


Statistics

The distributions of each attribute. CelebV-HQ has a diverse distribution on each attribute category. Overall, CelebV-HQ contains diverse facial attributes and natural distributions, bringing new opportu -unities and challenges to the community.

scales



Demo of Appearances


[1 / 3] Appearance: Eyeglasses

[2 / 3] Appearance: Bald

[3 / 3] Appearance: Wearing Masks


Demo of Actions


[1 / 3] Action: Laugh

[2 / 3] Action: Close Eyes

[3 / 3] Action: Talking


Demo of Emotions


[1 / 3] Emotion: Sad

[2 / 3] Emotion: Anger

[3 / 3] Emotion: Happy

Face Datasets Comparison

CelebV-HQ consists of 35,666 video clips of 3 to 20 seconds each, involving 15,653 identities, with a total video duration of about 65 hours. CelebV-HQ also contains 83 annotations, including 40 appearance attributes, 35 action attributes, and 8 emotion attributes.

overview

Benchmark

Unconditional Video Generation

We construct a benchmark of unconditional video generation task, for four currently prevalent models (VideoGPT, MoCoGAN-HD, DIGAN, and StyleGAN-V) on 4 face video datasets (FaceForensics, Vox, MEAD and CelebV-HQ).
CelebV-HQ as a challenging real-world dataset, still has room for community to make improvement.

Table: FVD/FID Metrics Comparsion
overview

VideoGPT


MoCoGAN-HD


DIGAN


StyleGAN-V

Agreement

  • The CelebV-HQ dataset is available for non-commercial research purposes only.
  • All videos of the CelebV-HQ dataset are obtained from the Internet which are not property of our institutions. Our institution are not responsible for the content nor the meaning of these videos.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the videos and any portion of derived data.
  • You agree not to further copy, publish or distribute any portion of the CelebV-HQ dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

Dataset

We provide a download tool that automatically fetches and processes videos from YouTube. We highly recommend using this tool to acquire the dataset. In addition, as some links may no longer be available, we host the full version of CelebV-HQ. Please contact us if needed.


Acknowledgements

We sincerely thank Zongcai Sun for his help with source data preparation and the download tool development. This work is partly supported by Shanghai AI Laboratory and SenseTime Research. It is also supported by NTU NAP, MOE AcRF Tier 1 (2021-T1-001-088), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).


BibTeX

If you find this helpful, please cite our work:

@inproceedings{zhu2022celebvhq,
  title={{CelebV-HQ}: A Large-Scale Video Facial Attributes Dataset},
  author={Zhu, Hao and Wu, Wayne and Zhu, Wentao and Jiang, Liming and Tang, Siwei and Zhang, Li and Liu, Ziwei and Loy, Chen Change},
  booktitle={ECCV},
  year={2022}
}