A Large-scale Video Text Dataset

DAVAR LAB

[Download (Please email us [See Contact])]

Introduction

Here, we release a large-scale video text dataset named LSVTD. In recent years, research in video scene text still remains unpopular in contrast to its promising application prospect. The existing video scene text datasets are limited on the scale of video items and scenarios, which may restrain research of video scene text spotting. Then we collect and annotate LSVTD, which contains 129 scene videos acquired from 21 typical real-life scenarios.

The dataset contains 129 video clips (ranging from several seconds to over 1 minutes long) from 21 real-life scenes. It was extended on the basis of LSVTD dataset by addding 15 videos for 'harbor surveillance' scenario and 14 videos for 'train watch' scenario, for the purpose of solving video text spotting problem in industrial transportation applications.

LSVTD mainly characterized is described in detail as follows:

  • Larger scale. it contains 129 video clips, larger than most existing scene video text datasets.
  • Diversified scenarios. it covers 21 indoor and outdoor real-life scenarios, see the figure left
  • Different capture devicesvideos are collected with multiple different kinds of video cameras: (1) mobile phone cameras in various indoor scenarios (e.g. bookstore and office building) and outdoor street views; (2) HD cameras in traffic and harbor surveillance; (3) And Car-DVR cameras in fast-moving outdoor scenarios (e.g. city road, highway)
  • Multilingual text instances. the dataset contains multiple languages which are divided into 2 major categories: alphanumeric and non-alphanumeric.
  • Dataset released

  • Videos: 129 videos are provided in total.
  • Annotations: both of the training (.xml) and testing (.xml) are provided.
  • Dataset access: We relase the dataset for open access in [link]
  • Contact

    If you have any questions about the dataset, please contact Jing Lu or Zhanzhan Cheng .(lujing6kh@163.com or 11821104@zju.edu.cn).

    Terms of Use

  • The public annotations belong to Zhejiang University and are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • The images belong to Zhejiang University and are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • Change Log

  • 2021-08-10 (GMT+8): add other 29 video clips. Current version is used in the ICDAR2021 competition
  • 2021-06-22 (GMT+8): testing dataset available.
  • 2020-03-06 (GMT+8): dataset update by removing some consecutive background frames and adding 5 extra videos
  • 2019-10-25 (GMT+8): dataset released
  • Recommended Citation

    If you find this dataset is helpful to your research, please feel free to cite us:
    @inproceedings{cheng2019you, 
    			title={You Only Recognize Once: Towards Fast Video Text Spotting}, 
    			author={Cheng, Zhanzhan and Lu, Jing and Niu, Yi and Pu, Shiliang and Wu, Fei and Zhou, Shuigeng}, 
    			booktitle={Proceedings of the 27th ACM International Conference on Multimedia}, 
    			pages={855–863}, 
    			year={2019}, 
    			organization={ACM} 
    			}
    }