SumBlogger: Abstractive Summarization for Large Collections of Scientific Articles

Pavlos Zakkas

Zeta Alpha & Leiden University

Suzan Verberne

Leiden Institute of Advanced Computer Science, Leiden University

Jakub Zavrel

Zeta Alpha

SumBlogger is a new a prompt-based end-to-end system for extreme summarization of large collections of scientific documents. Digesting scientific information is becoming more challenging, especially in high volume fast-paced fields like AI. Our system helps researchers find information faster and stay up to date with their field of interest using summaries generated by instruction-tuned Large Language Models (LLMs). Although LLMs have already been applied to several downstream tasks including news summarization, its effectiveness in the scientific domain and especially in summarizing large collections of papers, is underexplored.

The input for SumBlogger is a collection of scientific papers, e.g. thousands of papers published in a large conference. SumBlogger first groups these source documents using a graph-based clustering algorithm and selects representative papers per cluster based on their diversity and then it performs a two-step summarization. Using an instruction-tuned LLM as backbone, single-document summaries (SDS) of the selected papers are generated in the first step and their aggregation is performed in the second multi-document summarization (MDS) step. In the end, the cluster summaries are used to generate a full blog post, with the clusters as sections.

We evaluate the summarization components of our system in isolation using available SDS and MDS datasets in the scientific domain. For the SDS tasks, our system achieves better results than strong baselines: the fine-tuned models in the SciTLDR benchmark. For the MDS tasks, our two-step approach not only leads to a performance gain over a one-step approach where the full content of the source documents is passed in a single prompt to the LLM, but also reaches the performance of state-of-the-art fine-tuned models on the Multi-XScience dataset.

For the end-to-end evaluation of SumBlogger, we performed a small user study to assess the performance of our system in generating blog posts that summarize a whole AI conference. Preference judgments indicate that selecting representative papers before prompting the LLM for the summarization of a cluster is preferred by humans over using single-document summaries of all the papers of the cluster. Although a human-written blog post is still preferred over the automated one, the users appreciate the good informativeness and factuality of SumBlogger.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips