Salient Object Detection with Dynamic Convolutions

Abstract

Convolutional Neural Networks (CNNs) rely on content-independent convolution operations that extract features shared across the entire dataset, limiting their adaptability to individual inputs. In contrast, input-dependent architectures like Vision Transformers (ViTs) can adapt to the specific characteristics of each input. To enhance input adaptability in CNNs, we propose SODDCNet, an encoder-decoder architecture for Salient Object Detection (SOD) that employs large convolutions with dynamically generated weights via the self-attention mechanism. Additionally, unlike other CNN architectures, we utilize multiple large kernels in parallel to segment salient objects of various sizes. To pre-train the proposed model, we combine the COCO and OpenImages semantic segmentation datasets to create a 3.18M image dataset for SOD. Comprehensive quantitative experiments conducted on benchmark datasets demonstrate that SODDCNet performs competitively compared to state-of-the-art methods in SOD and Video SOD.

Cite

Text

Dulam and Kambhamettu. "Salient Object Detection with Dynamic Convolutions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Dulam and Kambhamettu. "Salient Object Detection with Dynamic Convolutions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/dulam2025cvprw-salient/)

BibTeX

@inproceedings{dulam2025cvprw-salient,
  title     = {{Salient Object Detection with Dynamic Convolutions}},
  author    = {Dulam, Rohit Venkata Sai and Kambhamettu, Chandra},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {1692-1702},
  url       = {https://mlanthology.org/cvprw/2025/dulam2025cvprw-salient/}
}