Syllable-, Bigram-, and Morphology-Driven Pseudoword Generation in Greek

Salvato in:
Dettagli Bibliografici
Pubblicato in:Applied Sciences vol. 15, no. 12 (2025), p. 6582
Autore principale: Kosmidis Kosmas
Altri autori: Apostolouda Vassiliki, Revithiadou Anthi
Pubblicazione:
MDPI AG
Soggetti:
Accesso online:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
Descrizione
Abstract:SyBig-r-Morph is a versatile tool for generating pseudowords designed for Greek, but it can be easily modified to work with any language. By allowing researchers to produce phonotactically and morphologically well-formed pseudowords that are specifically tailored to particular morphosyntactic categories, such as nouns or verbs, it overcomes the shortcomings of current multilingual generators. This tool is especially valuable for designing controlled linguistic experiments, including studies on stress assignment, lexical access, and morphophonological and lexical processing. By serving as an important link between orthographic representation and phonological realization—an important step in the text-to-speech pipeline—SyBig-r-Morph offers a valuable tool for psycholinguistic research, computational phonology, and speech synthesis applications that require linguistically authentic pseudoword stimuli. Pseudowords are essential in (psycho)linguistic research, offering a way to study language without meaning interference. Various methods for creating pseudowords exist, but each has its limitations. Traditional approaches modify existing words, risking unintended recognition. Modern algorithmic methods use high-frequency n-grams or syllable deconstruction but often require specialized expertise. Currently, no automatic process for pseudoword generation is designed explicitly for Greek, which is our primary focus. Therefore, we developed SyBig-r-Morph, a novel application that constructs pseudowords using syllables as the main building block, replicating Greek phonotactic patterns. SyBig-r-Morph draws input from word lists and databases that include syllabification, word length, part of speech, and frequency information. It categorizes syllables by position to ensure phonotactic consistency with user-selected morphosyntactic categories and can optionally assign stress to generated words. Additionally, the tool uses multiple lexicons to eliminate phonologically invalid combinations. Its modular architecture allows easy adaptation to other languages. To further evaluate its output, we conducted a manual assessment using a tool that verifies phonotactic well-formedness based on phonological parameters derived from a corpus. Most SyBig-r-Morph words passed the stricter phonotactic criteria, confirming the tool’s sound design and linguistic adequacy.
ISSN:2076-3417
DOI:10.3390/app15126582
Fonte:Publicly Available Content Database