Retrieval Augmented Generation of Tabular Answers at Query Time for domain-specific Question Answering

Irene Papadopoulou

University of Amsterdam

Jakub Zavrel

Zeta Alpha

Paul Groth

Zeta Alpha

Question-Answering systems typically generate answers in the form of sentences or paragraphs, but there are situations that call for a more structured representation of information. In our research, we address the challenge of table generation for domain-specific question-answering. Given a collection of documents containing the answers, our objective is to present the relevant documents and extracted answers in a tabulated or aggregated format. To accomplish this, we propose a novel five-step system pipeline called Search-Filter-Generate. We leverage the GPT 3.5 family of Large Language Models and employ prompt-engineering techniques such as zero-shot, few-shot, and few-shot with Chain-of-Thought (CoT) to achieve precise generation of factual answers. Additionally, we investigate the influence of context passages on answer generation by comparing sparse, dense retrieval, and reranking models. Our system's performance is evaluated on Qasper, a question-answering dataset focused on scientific papers. The results demonstrate that GPT3.5 models, augmented with passages retrieved from a state-of-the-art cross-encoder, perform comparably to fine-tuned models specifically designed for this task. Furthermore, we show that few-shot with CoT prompting effectively improves performance in the Evidence Selection task. Finally, we conduct a preliminary user study to assess the productivity impact of this Question-Answering system. The findings reveal a significant average time reduction of 55.4% and an average answer quality improvement of 23.9%.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips