Parsing Literary Texts: a Comparison of Parsers

Joris J. van Zundert

Huygens Institute, Amsterdam

Marijn Koolen

Huygens Institute, Amsterdam

Carsten Schnober

EScience Center, Amsterdam

The Huygens Institute based project “Impact and Fiction” aims to relate texts of novels to reader experiences reported in online reviews (Boot, Koolen, and Van Zundert 2020). Prerequisite for this analytical task is an as good as possible linguistically parsed corpus of novels. We use a corpus of 19k+ Dutch language novels accessible through the National Library, comprising original Dutch and translated texts, published between 2015 and 2019, representing various genres ranging from low valued but much read romance to highly acclaimed literature.

We found few reports that indicate which parser would best suit which domain for Dutch language sources (such as, for instance, Plank and Van Noord 2010), and we were unable to identify work comparing parsers for Dutch literary texts specifically. However, we found that commonly applied parsers yield different results on literary texts. Arguably this may have to do with the specific make up of texts in the literary domain where language creativity and subject originality is often promoted (cf. for instance Morley 2007:25). Volatility of sentence length variance, neologisms, extraordinary sentence structures, free indirect speech, and so forth, may all be quirks of literary style. Parsers are trained predominantly on non-literary domain texts like news items, message boards, internet texts etc. (Bandy and Vincent 2012). Thus, for literary researchers, it is useful to know what parser can be considered "best of breed" for literary texts, without having to run a full evaluation.

We report on a comparison of parsing results between five state of the art parsers (both rule based and neural network based parsers). In particular, we compare accuracy of lemmatization, POS tagging, and parsing speed for Trankit (Nguyen et al. 2021), Frog (Van den Bosch et al. 2007), Alpino (Van Noord, n.d.), spaCy (Honnibal 2015), and Stanza (Peng et al. 2020). For our evaluation, we used 500 sentences sampled from our corpus. We parsed those sentences with all parsers and had the results of each sentence checked by two human annotators. After this we computed inter annotator agreement and analyzed the error rate for each parser.

All parsers show high accuracy, as human annotators report fewer errors than self-reported error rates by parser developers. A comparison of the trade off between performance and accuracy appears unsurprising, with spaCy being fastest, but slightly less accurate, and Alpino being slowest but most accurate.

In our presentation we will discuss the merit of the observed accuracy. This accuracy may indicate that literary texts are relatively easy to parse. However, an alternative explanation could be the human ability to interpret ambiguous lemmatization and POS tagging, rendering human annotation more lenient than unforgiving ground truths. We will expound what errors are typically made by which parser, and we will discuss potential causes of errors. Are typical errors connected to certain types of sentences or genres, for instance, or rather to the type of parser? And should we, alternatively, point to the influence of strong redacting that published novels routinely undergo as a cause of low error rate?

## References

Bandy, Jack, and Nicholas Vincent. 2021. “Addressing ‘Documentation Debt’ in Machine Learning Research: A Retrospective Datasheet for BookCorpus.” Abs/2105.05241. arXiv. https://arxiv.org/abs/2105.05241.

Boot, Peter, Marijn Koolen, and Joris J. van Zundert. 2020. “ImpFic - Impact of Fiction.” Huygens Institute. https://impactandfiction.huygens.knaw.nl/wp-content/uploads/2022/03/ImpFic-Impact-of-Fiction-Proposal-clean-1.pdf.

Honnibal, Matthew. 2015. “Introducing SpaCy.” February 19, 2015. https://explosion.ai/blog/introducing-spacy.

Morley, David. 2007. The Cambridge Introduction to Creative Writing. Cambridge: Cambridge University Press.

Nguyen, Minh Van, Viet Lai, Amir Pouran Ben Veyseh, and Thien Huu Nguyen. 2021. “Trankit: A Light-Weight Transformer-Based Toolkit for Multilingual Natural Language Processing.” In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. Online: Association for Computational Linguistics. https://arxiv.org/pdf/2101.03289.pdf.

Plank, Barbara, and Gertjan Van Noord. 2010. “Dutch Dependency Parser Performance Across Domains.” In Proceedings of the 20th Meeting of Computational Linguistics in the Netherlands, 123–138. Utrecht. https://clinjournal.org/CLIN_proceedings/XX/plank.pdf.

Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 101–108. Stroudsburg: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-demos.14.

Van den Bosch, A., G.J. Busser, W. Daelemans, and S. Canisius. 2007. “An Efficient Memory-Based Morphosyntactic Tagger and Parser for Dutch.” In Proceedings of the 17th Computational Linguistics in the Netherlands Meeting, edited by F. Van Eynde, P. Dirix, I Schuurman, and V. Vandeghinste, 191–206. Leuven: LOT. https://www.clinjournal.org/CLIN_proceedings/XVII/vandenbosch.pdf.

Van Noord, Gertjan. n.d. “Alpino.” Alpino. Accessed June 13, 2023. https://www.let.rug.nl/vannoord/alp/Alpino/.

--

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips