Segmentation of Tweets with URLs and Its Applications to Sentiment Analysis

Abstract

An important means for disseminating information in social media platforms is by including URLs that point to external sources in user posts. In Twitter, we estimate that about 21% of the daily stream of English-language tweets contain URLs. We notice that NLP tools make little attempt at understanding the relationship between the content of the URL and the text surrounding it in a tweet. In this work, we study the structure of tweets with URLs relative to the content of the Web documents pointed to by the URLs. We identify several segments classes that may appear in a tweet with URLs, such as the title of a Web page and the user's original content. Our goals in this paper are: introduce, define, and analyze the segmentation problem of tweets with URLs, develop an effective algorithm to solve it, and show that our solution can benefit sentiment analysis on Twitter. We also show that the problem is an instance of the block edit distance problem, and thus an NP-hard problem.

Cite

Text

Aljebreen et al. "Segmentation of Tweets with URLs and Its Applications to Sentiment Analysis." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I14.17480

Markdown

[Aljebreen et al. "Segmentation of Tweets with URLs and Its Applications to Sentiment Analysis." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/aljebreen2021aaai-segmentation/) doi:10.1609/AAAI.V35I14.17480

BibTeX

@inproceedings{aljebreen2021aaai-segmentation,
  title     = {{Segmentation of Tweets with URLs and Its Applications to Sentiment Analysis}},
  author    = {Aljebreen, Abdullah and Meng, Weiyi and Dragut, Eduard C.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {12480-12488},
  doi       = {10.1609/AAAI.V35I14.17480},
  url       = {https://mlanthology.org/aaai/2021/aljebreen2021aaai-segmentation/}
}