Automatic induction of language model data for a spoken dialogue system

Guardado en:
Detalles Bibliográficos
Publicado en:Language Resources and Evaluation vol. 40, no. 1 (Feb 2006), p. 25-46
Autor principal: Wang, Chao
Otros Autores: Chung, Grace, Seneff, Stephanie
Publicado:
Springer Nature B.V.
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 214800583
003 UK-CbPIL
022 |a 1574-020X 
022 |a 1574-0218 
022 |a 0010-4817 
024 7 |a 10.1007/s10579-006-9007-3  |2 doi 
035 |a 214800583 
045 2 |b d20060201  |b d20060228 
084 |a 15327  |2 nlm 
100 1 |a Wang, Chao 
245 1 |a Automatic induction of language model data for a spoken dialogue system 
260 |b Springer Nature B.V.  |c Feb 2006 
513 |a Journal Article 
520 3 |a In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods. [PUBLICATION ABSTRACT] 
610 4 |a Massachusetts Institute of Technology 
653 |a Models 
653 |a Linguistics 
653 |a Discourse analysis 
653 |a Semantics 
653 |a Data analysis 
653 |a Syntax semantics relationship 
653 |a Sampling methods 
653 |a Corpus analysis 
653 |a Corpus linguistics 
653 |a Statistical analysis 
653 |a Computer simulation 
653 |a Logic 
653 |a Training 
653 |a Sentences 
653 |a Error analysis 
653 |a Human-computer interaction 
653 |a Simulation 
653 |a Utterances 
653 |a Probability 
653 |a Induction 
653 |a Sampling 
653 |a Data 
653 |a Restaurants 
700 1 |a Chung, Grace 
700 1 |a Seneff, Stephanie 
773 0 |t Language Resources and Evaluation  |g vol. 40, no. 1 (Feb 2006), p. 25-46 
786 0 |d ProQuest  |t Arts & Humanities Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/214800583/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/214800583/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/214800583/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch