DMAP: A Distribution mAP for Text

Abstract

Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting this signal typically relies on metrics such as perplexity, which do not adequately account for context; how one should interpret a given next-token probability is dependent on the number of reasonable choices encoded by the shape of the conditional distribution. In this work, we present DMAP, a mathematically grounded method that maps a text, via a language model, to a set of samples in the unit interval that jointly encode rank and probability information. This representation enables efficient, model-agnostic analysis and supports a range of applications. We illustrate its utility through three case studies: (i) validation of generation parameters to ensure data integrity, (ii) examining the role of probability curvature in machine-generated text detection, and (iii) a forensic analysis revealing statistical fingerprints left in downstream models that have been subject to post-training on synthetic data. Our results demonstrate that DMAP offers a unified statistical view of text that is simple to compute on consumer hardware, widely applicable, and provides a foundation for further research into text analysis with LLMs.

Cite

Text

Kempton et al. "DMAP: A Distribution mAP for Text." International Conference on Learning Representations, 2026.

Markdown

[Kempton et al. "DMAP: A Distribution mAP for Text." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/kempton2026iclr-dmap/)

BibTeX

@inproceedings{kempton2026iclr-dmap,
  title     = {{DMAP: A Distribution mAP for Text}},
  author    = {Kempton, Tom and Rozanova, Julia and Kamalaruban, Parameswaran and Madigan, Maeve and Wresilo, Karolina and Launay, Yoann and Sutton, David and Burrell, Stuart},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/kempton2026iclr-dmap/}
}