Large-scale datasets played an indispensable role in the recent success of face generation/editing and significantly facilitate the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for face-related video research. We propose a large-scale, high-quality, and diverse video dataset, named the High-Quality Celebrity Video Dataset (CelebV-HQ), with rich facial attribute annotations.
CelebV-HQ contains 35,666 video clips involving 15,653 identities and 83 manually labeled facial attributes covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of ethnicity, age, brightness, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on unconditional video generation and video facial attribute editing tasks.
For more details of the dataset, please refer to the paper "CelebV-HQ: A Large-scale Video Facial Attributes Dataset".
The distributions of each attribute. CelebV-HQ has a diverse distribution on each attribute category. Overall, CelebV-HQ contains diverse facial attributes and natural distributions, bringing new opportu -unities and challenges to the community.
CelebV-HQ consists of 35,666 video clips of 3 to 20 seconds each, involving 15,653 identities, with a total video duration of about 65 hours. CelebV-HQ also contains 83 annotations, including 40 appearance attributes, 35 action attributes, and 8 emotion attributes.
We construct a benchmark of unconditional video generation task, for four currently prevalent models (VideoGPT, MoCoGAN-HD, DIGAN, and StyleGAN-V) on 4 face video datasets (FaceForensics, Vox, MEAD and CelebV-HQ).
CelebV-HQ as a challenging real-world dataset, still has room for community to make improvement.
We provide a download tool that automatically fetches and processes videos from YouTube. We highly recommend using this tool to acquire the dataset. In addition, as some links may no longer be available, we host the full version of CelebV-HQ. Please contact us if needed.
We sincerely thank Zongcai Sun for his help with source data preparation and the download tool development. This work is partly supported by Shanghai AI Laboratory and SenseTime Research. It is also supported by NTU NAP, MOE AcRF Tier 1 (2021-T1-001-088), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
If you find this helpful, please cite our work:
@inproceedings{zhu2022celebvhq,
title={{CelebV-HQ}: A Large-Scale Video Facial Attributes Dataset},
author={Zhu, Hao and Wu, Wayne and Zhu, Wentao and Jiang, Liming and Tang, Siwei and Zhang, Li and Liu, Ziwei and Loy, Chen Change},
booktitle={ECCV},
year={2022}
}