Automatic assessment of child read speech with disfluencies

Wieke Harmsen

Radboud University

Martijn Bentum

Radboud University

Yu Bai

Radboud University

Ferdy Hubers

Radboud University

Roeland van Hout

Radboud University

Catia Cucchiarini

Radboud University

Helmer Strik

Radboud University

In learning to read children go through different stages that vary from cracking the alphabetic code, acquiring phonemic awareness and letter knowledge, learning relationships between letters and sounds, developing decoding skills to be able to identify individual words, retrieve their meanings and finally understand written text (Castles et al. 2018).

Children’s decoding skills are measured by primary school teachers on a regular basis. In many Dutch and Flemish primary schools, teachers spend much time to administer such reading-aloud-tests, like the Cito tests DMT (Drie Minuten Test) (van Til et al., 2018a) and AVI (Analyse van Individualiseringsvormen) (van Til et al., 2018b), to all pupils individually. Developing automatic tools that can support teachers in this difficult and time-consuming task, will not only save precious time of teachers, but can also help making testing more objective. In addition, automatic tools allows for the collection of more detailed data on reading performance (speech rate, reaction times, number of attempts, and information on the errors made) that can normally not be recorded by a single teacher while testing pupils.

Within the Dutch Automatic Reading Tutor (DART) project Automatic Speech Recognition (ASR) technology has been developed and evaluated to support reading instruction and assessment specifically at the stage when children are still acquiring decoding skills. One of the important aspects in this research was the assessment of decoding skills. For this aim a test was developed that very much resembles the DMT test, in which children had to read word lists of different levels of difficulty. Two versions of the test were used as the pretest and posttest to evaluate the impact of ASR-based reading practice and feedback as provided by DART. A total of 249 children completed both the pretest and the posttest, while 174 children completed only the pretest and 134 children only the posttest.

To be able to compute measures that represent children’s reading decoding skills, we needed for each word in the prompt information about what is read and the time it took to read this. We obtained this information automatically by using different ASR systems. However, since the children were still acquiring their decoding skills, the speech recordings were characterized by disfluencies in the form of repetitions, false starts, spelled reading, skipped words and not-prompted inserted words, which complicated the analysis. In the current study, we investigated the performance of different ASR systems on these speech recordings with many disfluencies by comparing the ASR output with manual transcriptions and word correctness assessments by primary school teachers. We discuss the advantages and disadvantages of the ASR systems and make suggestions for pre- and postprocessing techniques that yield a better interpretation of the ASR output.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips