Salient Object Detection with Dynamic Convolutions
Abstract
Convolutional Neural Networks (CNNs) rely on content-independent convolution operations that extract features shared across the entire dataset, limiting their adaptability to individual inputs. In contrast, input-dependent architectures like Vision Transformers (ViTs) can adapt to the specific characteristics of each input. To enhance input adaptability in CNNs, we propose SODDCNet, an encoder-decoder architecture for Salient Object Detection (SOD) that employs large convolutions with dynamically generated weights via the self-attention mechanism. Additionally, unlike other CNN architectures, we utilize multiple large kernels in parallel to segment salient objects of various sizes. To pre-train the proposed model, we combine the COCO and OpenImages semantic segmentation datasets to create a 3.18M image dataset for SOD. Comprehensive quantitative experiments conducted on benchmark datasets demonstrate that SODDCNet performs competitively compared to state-of-the-art methods in SOD and Video SOD.
Cite
Text
Dulam and Kambhamettu. "Salient Object Detection with Dynamic Convolutions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.Markdown
[Dulam and Kambhamettu. "Salient Object Detection with Dynamic Convolutions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/dulam2025cvprw-salient/)BibTeX
@inproceedings{dulam2025cvprw-salient,
title = {{Salient Object Detection with Dynamic Convolutions}},
author = {Dulam, Rohit Venkata Sai and Kambhamettu, Chandra},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2025},
pages = {1692-1702},
url = {https://mlanthology.org/cvprw/2025/dulam2025cvprw-salient/}
}