Citation-Similarity Relationships in Astrophysics Literature
Abstract
We report a novel observation about which scientific publications are cited more frequently: those that are more textually similar to pre-existing publications. Using bag-of-word document embeddings, we analyze quantitative trends for a large sample of publication abstracts in the field of astrophysics ($N \sim 300,000$). When new publications are ranked by how many similar publications already exist in their neighborhood, the median number of citations per year that the upper 50$^{\rm th}$ percentile receives is $\sim 1.6$ times the median of the lower 50$^{\rm th}$ percentile. When new publications are ranked by an alternative metric of dissimilarity to neighbors, the median citations per year that the upper 50$^{\rm th}$ percentile receives is $\sim 0.74$ times the median of the lower 50$^{\rm th}$ percentile. We discuss a number of hypotheses that could explain these citation-similarity relationships relevant to the science of science.
Cite
Text
Imel and Hafen. "Citation-Similarity Relationships in Astrophysics Literature." NeurIPS 2023 Workshops: AI4Science, 2023.Markdown
[Imel and Hafen. "Citation-Similarity Relationships in Astrophysics Literature." NeurIPS 2023 Workshops: AI4Science, 2023.](https://mlanthology.org/neuripsw/2023/imel2023neuripsw-citationsimilarity/)BibTeX
@inproceedings{imel2023neuripsw-citationsimilarity,
title = {{Citation-Similarity Relationships in Astrophysics Literature}},
author = {Imel, Nathaniel and Hafen, Zachary},
booktitle = {NeurIPS 2023 Workshops: AI4Science},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/imel2023neuripsw-citationsimilarity/}
}