ICCV 2021 Workshop SSLAD Track 1 - 2D Object Detection

DAVAR LAB

Official Website

[Technical Report]

Introduction

Autonomous driving technology has been significantly accelerated in recent years because of its great potential in reducing accidents, saving human lives and improving efficiency. In this competition, we focus on 2D object detection task with limited annotations and massive unlabeled images. Formally, given the 5k fully-annotated training images and 10M unlabeled images, the task of this track is to infer the bounding-box location and category for each images in validation and testing set.

We have won the 2nd place out of 209 teams in this challenge using semi-supervised learning solution.

Solution Description

In this competition, we proposed a holistic SS-OD (Semi-Supervised Object Detection) framework called O2O (Offline to Online) to combine the advantages of self-training based and consistency-regularization based methods. We train our model in following steps: 1) train a baseline detector on labeled images, 2) first utilize the fixed pseudo labels generated on unlabeled images by the baseline detector to train the student detector, and then 3) switch to Teacher-Student training pattern after a period a iterations. Also we can repeat the above process a few times to further improve the performance by putting the trained SS-OD model back as the baseline detector and updating the pseudo labels.

We adopted the Cascade R-CNN detector with Swin Transformer as the backbone. Multiple Augmentation strategies were used to avoid overfitting and help detect hard objects. Also, we designed a two-stage auto ensemble scheme, in which the proposals of all models were fused together and fed to ROI Heads respectively to produce final results.