CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation

Abstract

Unsupervised action segmentation has recently pushed its limits with ASOT, an optimal transport (OT)-based method that simultaneously learns action representations and performs clustering using pseudo-labels. Unlike other OT-based approaches, ASOT makes no assumptions about action ordering and can decode a temporally consistent segmentation from a noisy cost matrix between video frames and action labels. However, the resulting segmentation lacks segment-level supervision, limiting the effectiveness of feedback between frames and action representations. To address this limitation, we propose Closed Loop Optimal Transport (CLOT), a novel OT-based framework with a multi-level cyclic feature learning mechanism. Leveraging its encoder-decoder architecture, CLOT learns pseudo-labels alongside frame and segment embeddings by solving two separate OT problems. It then refines both frame embeddings and pseudo-labels through cross-attention between the learned frame and segment embeddings, by integrating a third OT problem. Experimental results on four benchmark datasets demonstrate the benefits of cyclical learning for unsupervised action segmentation.

Cite

Text

Bueno-Benito and Dimiccoli. "CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation." International Conference on Computer Vision, 2025.

Markdown

[Bueno-Benito and Dimiccoli. "CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/buenobenito2025iccv-clot/)

BibTeX

@inproceedings{buenobenito2025iccv-clot,
  title     = {{CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation}},
  author    = {Bueno-Benito, Elena and Dimiccoli, Mariella},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {10719-10729},
  url       = {https://mlanthology.org/iccv/2025/buenobenito2025iccv-clot/}
}