REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing
Abstract
Fine-grained image recognition has been a hot research topic in computer vision due to its various applications. The-state-of-the-art is the part/region-based approaches that first localize discriminative parts/regions, and then learn their fine-grained features. However, these approaches have some inherent drawbacks: 1) the discriminative feature representation of an object is prone to be disturbed by complicated background; 2) it is unreasonable and inflexible to fix the number of salient parts, because the intended parts may be unavailable under certain circumstances due to occlusion or incompleteness, and 3) the spatial correlation among different salient parts has not been thoroughly exploited (if not completely neglected). To overcome these drawbacks, in this paper we propose a new, simple yet robust method by building part sequence model on the attended object region. Concretely, we first try to alleviate the background effect by using a region attention mechanism to generate the attended region from the original image. Then, instead of localizing different salient parts and extracting their features separately, we learn the part representation implicitly by applying a mapping function on the serialized features of the object. Finally, we combine the region attending network and the part sequence learning network into a unified framework that can be trained end-to-end with only image-level labels. Our extensive experiments on three fine-grained benchmarks show that the proposed method achieves the state of the art performance. [Paper]Highlights Contributions
❃ We propose the novel 'soft-part' concept, and implement this concept by designing a part sequence learning network (PSN), which learns implicit discriminative part representation and captures the spatial context simultaneously.
❃ We apply the region attending network to localizing the object region and alleviating the interference of complicated background to fine feature representation.
❃ We integrate the region attending network and the part sequence learning network into a unified framework, and train it end-to-end without any part-level annotation.
❃ We conduct extensive experiments on three challenging datasets (Stanford Cars, FGVC-Aircraft and CUB Birds), which demonstrate the superiority of our method over the existing ones.
Recommended Citations
If you find our work is helpful to your research, please feel free to cite us:@inproceedings{zhang2019reaps, title={REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing}, author={Zhang, Peng and Zhu, Xinyu and Cheng, Zhanzhan and Zhou, Shuigeng and Niu, Yi}, booktitle={Chinese Conference on Pattern Recognition and Computer Vision (PRCV)}, pages={193--204}, year={2019}, organization={Springer} }