Research Highlights

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Shihua Huang, Zhichao Lu, Ran Cheng*, Cheng He

Abstract:

Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction. However, the issue of feature alignment remains as neglected by most existing approaches for simplicity. Direct pixel addition between upsampled and local features leads to feature maps with misaligned contexts that, in turn, translate to mis-classifications in prediction, especially on object boundaries. In this paper, we propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled higher-level features; and another feature selection module to emphasize the lower-level features with rich spatial details. We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN). Extensive experimental evaluations on four dense prediction tasks and four datasets have demonstrated the efficacy of FaPN, yielding an overall improvement of 1.2 – 2.6 points in AP / mIoU over FPN when paired with Faster / Mask R-CNN. In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former. [Source Code]

Results

Ablation Study

Table 1: The Ablative Analysis: Comparing the performance of our FaPN with other variants on Cityscapes for semantic segmentation.

Boundary Prediction Analysis

Table 2: Segmentation Performance around Boundaries: Comparing the performance of our FaPN with the original FPN in terms of mIoU over boundary pixels on Cityscapes val with different thresholds on boundary pixels.  
Figure 1: Visualization of the input (upsampled features) to and the output (aligned features) from our FAM.

Object Detection

Table 3: Object Detection: Performance comparisons on MS COCO val set between FPN and FaPN.

Semantic Segmentation

Table 4: Semantic Segmentation: Performance comparisons on Cityscapes val set between FPN and FaPN.
Table 5: Comparison to SOTA on (a) ADE20K val and (b) COCO-Stuff-10K test. We report both single-scale (s.s.) and multi-scale (m.s.) semantic segmentation performance.

Instance Segmentation

Table 6: Instance Segmentation: Performance comparisons on MS COCO val set between FPN and FaPN.

Panoptic Segmentation

Table 7: Panoptic Segmentation: Performance comparisons on MS COCO val set between FPN and FaPN.

Real-time Semantic Segmentation

Table 8: Real-time semantic segmentation on (a) Cityscapes and (b) COCO-Stuff-10K.
Citation
@article{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction},
  author={Huang, Shihua and Lu, Zhichao and Cheng, Ran and He, Cheng},
  journal={IEEE ICCV},
  year={2021}
}

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61903178, 61906081, and U20A20306) and the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386).

Related Posts