Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Junjie Chen, Xiangheng He, Yusuke Miyao, Danushka Bollegala

ICLR 2025

/iclr/2025/chen2025iclr-improving/

Abstract

Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics, resulting in a weak correlation between LL values and parsing accuracy. In this paper, we introduce a novel objective that trains parsers by maximizing SemInfo, the semantic information encoded in constituent structures. We introduce a bag-of-substrings model to represent the semantics and estimate the SemInfo value using the probability-weighted information metric. We apply the SemInfo maximization objective to training Probabilistic Context-Free Grammar (PCFG) parsers and develop a Tree Conditional Random Field (TreeCRF)-based model to facilitate the training. Experiments show that SemInfo correlates more strongly with parsing accuracy than LL, establishing SemInfo as a better unsupervised parsing objective. As a result, our algorithm significantly improves parsing accuracy by an average of 7.85 sentence-F1 scores across five PCFG variants and in four languages, achieving state-of-the-art level results in three of the four languages.

PDF ICLR Semantic Scholar

Cite

Text

Chen et al. "Improving Unsupervised Constituency Parsing via Maximizing Semantic Information." International Conference on Learning Representations, 2025.

Markdown

[Chen et al. "Improving Unsupervised Constituency Parsing via Maximizing Semantic Information." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/chen2025iclr-improving/)

BibTeX

@inproceedings{chen2025iclr-improving,
  title     = {{Improving Unsupervised Constituency Parsing via Maximizing Semantic Information}},
  author    = {Chen, Junjie and He, Xiangheng and Miyao, Yusuke and Bollegala, Danushka},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/chen2025iclr-improving/}
}