Attention Beam: An Image Captioning Approach (Student Abstract)

Abstract

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Cite

Text

Shrimal and Chakraborty. "Attention Beam: An Image Captioning Approach (Student Abstract)." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I18.17940

Markdown

[Shrimal and Chakraborty. "Attention Beam: An Image Captioning Approach (Student Abstract)." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/shrimal2021aaai-attention/) doi:10.1609/AAAI.V35I18.17940

BibTeX

@inproceedings{shrimal2021aaai-attention,
  title     = {{Attention Beam: An Image Captioning Approach (Student Abstract)}},
  author    = {Shrimal, Anubhav and Chakraborty, Tanmoy},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {15887-15888},
  doi       = {10.1609/AAAI.V35I18.17940},
  url       = {https://mlanthology.org/aaai/2021/shrimal2021aaai-attention/}
}