Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning

Abstract

Physical commonsense is an essential aspect of human cognition, involving an intuitive understanding of the physical properties and interactions of everyday objects and materials. Though physical commonsense reasoning should inherently be a multisensory task, integrating both video and audio signals, existing physical audiovisual commonsense reasoning (PACR) models predominantly rely on visual information. This reliance leads to spurious correlations and undermines the models’ reasoning and generalization abilities. To counteract this, we introduce a model-agnostic Counterfactual Physical Audiovisual Commonsense Reasoning (CF-PACR) framework aimed at mitigating visual bias-induced spurious effects. Specifically, we construct a traditional PACR model using both audio and visual information as the factual reasoning model. Subsequently, in the counterfactual reasoning model, we isolate visual information to estimate direct effects. Finally, we subtract the direct effects from the total effects across modalities to derive indirect effects, thereby mitigating visual biases. Extensive experiments validate the effectiveness and generalizability of CF-PACR in alleviating the spurious correlations between visual modality and model predictions.

Cite

Text

Zong et al. "Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I14.33675

Markdown

[Zong et al. "Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zong2025aaai-counterfactual/) doi:10.1609/AAAI.V39I14.33675

BibTeX

@inproceedings{zong2025aaai-counterfactual,
  title     = {{Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning}},
  author    = {Zong, Daoming and Ding, Chaoyue and Chen, Kaitao and Li, Yinsheng and Wang, Shuaiyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {15265-15273},
  doi       = {10.1609/AAAI.V39I14.33675},
  url       = {https://mlanthology.org/aaai/2025/zong2025aaai-counterfactual/}
}