Curriculum Conditioned Diffusion for Multimodal Recommendation

Abstract

Multimodal recommendation (MMRec) aims to integrate multimodal information of items to address the inherent data sparsity issue in collaborative-based recommendation. Traditional MMRec methods typically capture the structure-level item representations from the observed user behaviors within the multimodal graph, overlooking the potential impact of negative instances for personalized preference understanding. In light of the outstanding generative ability and step-by-step inference characteristic of Diffusion Models (DMs), we propose a Curriculum Conditioned Diffusion framework for Multimodal Recommendation (CCDRec), which precisely excavates the modality-aware distribution-level correlation among multi-modalities and elegantly integrates the reverse phase of DMs into negative sampling to highlight the most suitable instances in a curricular manner. Specifically, CCDRec proposes the Diffusion-controlled Multimodal Aligning module (DMA) to align multimodal knowledge with collaborative signals by capturing the fine-grained relationships among multi-modalities in the probabilistic distribution space. Furthermore, CCDRec designs the Negative-sensitive Diffusive Inferring module (NDI) to progressively synthesize the negative sample pool with diverse hardness to support the following knowledge-aware negative sampling. To gradually ramp up the training complexity, CCDRec further introduces a Curricular Negative Sampler (CNS) to tally the curriculum learning paradigm with the reverse phase of DMA, thereby adaptively sampling the gold-standard negative instances to enhance optimization. Extensive experiments on three datasets with four diverse backbones demonstrate the effectiveness and robustness of our CCDRec. The visualization analyses also clarify the underlying mechanism of our DMA in multimodal representation alignment and CNS in curricular negative discovery. The code and the corresponding dataset will be uploaded in the Appendix.

Cite

Text

Yang et al. "Curriculum Conditioned Diffusion for Multimodal Recommendation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I12.33422

Markdown

[Yang et al. "Curriculum Conditioned Diffusion for Multimodal Recommendation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/yang2025aaai-curriculum/) doi:10.1609/AAAI.V39I12.33422

BibTeX

@inproceedings{yang2025aaai-curriculum,
  title     = {{Curriculum Conditioned Diffusion for Multimodal Recommendation}},
  author    = {Yang, Yimeng and Ma, Haokai and Meng, Lei and Xu, Shuo and Xie, Ruobing and Meng, Xiangxu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {13035-13043},
  doi       = {10.1609/AAAI.V39I12.33422},
  url       = {https://mlanthology.org/aaai/2025/yang2025aaai-curriculum/}
}