Citation-Similarity Relationships in Astrophysics Literature

Abstract

We report a novel observation about which scientific publications are cited more frequently: those that are more textually similar to pre-existing publications. Using bag-of-word document embeddings, we analyze quantitative trends for a large sample of publication abstracts in the field of astrophysics ($N \sim 300,000$). When new publications are ranked by how many similar publications already exist in their neighborhood, the median number of citations per year that the upper 50$^{\rm th}$ percentile receives is $\sim 1.6$ times the median of the lower 50$^{\rm th}$ percentile. When new publications are ranked by an alternative metric of dissimilarity to neighbors, the median citations per year that the upper 50$^{\rm th}$ percentile receives is $\sim 0.74$ times the median of the lower 50$^{\rm th}$ percentile. We discuss a number of hypotheses that could explain these citation-similarity relationships relevant to the science of science.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Imel and Hafen. "Citation-Similarity Relationships in Astrophysics Literature." NeurIPS 2023 Workshops: AI4Science, 2023.

Markdown

[Imel and Hafen. "Citation-Similarity Relationships in Astrophysics Literature." NeurIPS 2023 Workshops: AI4Science, 2023.](https://mlanthology.org/neuripsw/2023/imel2023neuripsw-citationsimilarity/)

BibTeX

@inproceedings{imel2023neuripsw-citationsimilarity,
  title     = {{Citation-Similarity Relationships in Astrophysics Literature}},
  author    = {Imel, Nathaniel and Hafen, Zachary},
  booktitle = {NeurIPS 2023 Workshops: AI4Science},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/imel2023neuripsw-citationsimilarity/}
}