Integrating AuChAnn into SASTA

Jan Odijk

Utrecht University

SASTA is an application that automatically analyses transcripts of spontaneous language in accordance with the TARSP (young children), STAP (older children) or ASTA (patients with aphasia) methods. SASTA fully supports so-called CHAT-annotations (MacWhinney 2000), which can be used to enrich a (possibly quite deviant) utterance to make clear how the utterance was intended. For example, a child may have uttered “deze hok”, but the transcriber indicates that the child intended to say “dit is een hok” by means of CHAT annotations: deze [: dit] 0is 0een hok. SASTA applies all CHAT annotations, which yields “dit is een hok”, parses the resulting sentence, and then applies “backplacement” : it replaces “ dit” again by “ deze”, en deletes the inserted word “is” and “ een” in the syntactic structure, so that the utterance “deze hok” has a syntactic structure based on “Dit is een hok”. With this syntactic structure SASTA analyses this utterance correctly as containing a subject and a complement (TARSP code OndVC) and a substantively used demonstrative pronoun (TARSP code AVn), and not as an NP with “ deze” as determiner (TARSP Code deze/dieNP) .

Unfortunately, many researchers and clinical professionals do not use CHAT annotations, inter alia because the CHAT annotation system is very rich and complicated . Instead, they add a separate explanation to the actual utterance, e.g., utterance = “deze hok”, explanation=”dit is een hok”. This makes it impossible for SASTA to obtain the correct analysis, irrespective of whether SASTA uses the utterance or the explanation.

Frank Wijnen and colleagues created AuChAnn, which combines an utterance and an explanation automatically into a CHAT-annotated utterance: utterance = “deze hok” and explanation = “dit is een hok” results in the CHAT-annotated utterance “deze [: dit] 0is 0een hok”.

In this poster I will show what effect the integration of AuChAnn into SASTA has. For a dedicated test sample containing only examples with an utterance and a separate explanation it improves the SASTA analysis results with >24 percent points for recall and more than 10 percent points for precision. In almost all transcripts that contain an utterance and a separate explanation for some utterances, an improvement in both recall and precision is observed. In a few cases, the analysis results decrease, and we will discuss the reasons for this in the poster. We will also show that AuChAnn, under its default settings, produces a less felicitous CHAT-annotated utterance for some utterances, which can be adapted by changing AuChAnn’s settings. Finally, we will describe how the results can probably be improved even further by adapting the “ backplacement” procedure.

SASTA: https://sasta.hum.uu.nl

AuChann: https://github.com/UUDigitalHumanitieslab/auchann

CLIN33

The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)

UAntwerpen City Campus: Building R

Rodestraat 14, Antwerp, Belgium

22 September 2023