The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning

Abstract

The advent of powerful transformer-based discriminative language models and, more recently, generative GPT-family models, has led to notable advancements in natural language processing (NLP), particularly in commonsense reasoning tasks. One such task is commonsense reasoning, where performance is usually evaluated through multiple-choice question-answering benchmarks. Till date, many such benchmarks have been proposed and `leaderboards' tracking state-of-the-art performance on those benchmarks suggest that transformer-based models are approaching human-like performance. However, due to documented problems such as hallucination and bias, the research focus is shifting from merely quantifying accuracy on the task to an in-depth, context-sensitive probing of LLMs' generalization and robustness. To gain deeper insight into diagnosing these models' performance in commonsense reasoning scenarios, this thesis addresses three main studies: the generalization ability of transformer-based language models on commonsense reasoning, the trend in confidence distribution of these language models confronted with ambiguous inference tasks, and a proposed risk-centric evaluation framework for both discriminative and generative language models.

Cite

Text

Shen. "The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I21.30410

Markdown

[Shen. "The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/shen2024aaai-generalization/) doi:10.1609/AAAI.V38I21.30410

BibTeX

@inproceedings{shen2024aaai-generalization,
  title     = {{The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning}},
  author    = {Shen, Ke},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {23419-23420},
  doi       = {10.1609/AAAI.V38I21.30410},
  url       = {https://mlanthology.org/aaai/2024/shen2024aaai-generalization/}
}