Modular Interactive Video Object Segmentation:
Interaction-to-Mask, Propagation and Difference-Aware Fusion
BL30K Dataset
BL30K:
  • - 29,989 synthetic videos using 51,300 animated 3D models
  • - Each video has 160 frames
  • - Each frame has a resolution of 768*512, with pixel-accurate annotation
  • - 3-5 objects per video
  • - Object intersections are minimized using a greedy avoidance algorithm
Download:

We break the dataset into six segments, each with approximately 5K videos. We noted that using probably half of the data is sufficient to reach full performance (although we still used all), but using less than one-sixth (5K) is insufficient.

Each segment is about 115GB in size -- 700GB in total. Google Drive is much faster in my experience. Your mileage might vary.

[Google Drive]     [OneDrive]
Or you can use the Python script (download_bl30k.py) provided in [this repo] for automatic download and extraction. The script uses Google Drive links.

UST Mirror (Reliability not guaranteed, speed throttled, do not use if others are available):
ckcpu1.cse.ust.hk:8080/MiVOS/BL30K_{a-f}.tar (Replace {a-f} with the part that you need).
If you use this dataset, please cite our paper:


@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}