Evaluating the Markov assumption in Markov Decision Processes for spoken dialogue management

Guardado en:

Detalles Bibliográficos
Publicado en:	Language Resources and Evaluation vol. 40, no. 1 (Feb 2006), p. 47-66
Autor principal:	Paek, Tim
Otros Autores:	David Maxwell Chickering
Publicado:	Springer Nature B.V.
Materias:	Markov analysis Decision making models Verbal communication Voice communication Speech Reinforcement Beliefs Computer simulation Optimization Markov chains Simulation Experiments Action Assumptions Models Robustness Alternative approaches
Acceso en línea:	Citation/Abstract Full Text Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC


LEADER	00000nab a2200000uu 4500
001	214793002
003	UK-CbPIL
022			\|a 1574-020X
022			\|a 1574-0218
022			\|a 0010-4817
024	7		\|a 10.1007/s10579-006-9008-2 \|2 doi
035			\|a 214793002
045	2		\|b d20060201 \|b d20060228
084			\|a 15327 \|2 nlm
100	1		\|a Paek, Tim
245	1		\|a Evaluating the Markov assumption in Markov Decision Processes for spoken dialogue management
260			\|b Springer Nature B.V. \|c Feb 2006
513			\|a Journal Article
520	3		\|a The goal of dialogue management in a spoken dialogue system is to take actions based on observations and inferred beliefs. To ensure that the actions optimize the performance or robustness of the system, researchers have turned to reinforcement learning methods to learn policies for action selection. To derive an optimal policy from data, the dynamics of the system is often represented as a Markov Decision Process (MDP), which assumes that the state of the dialogue depends only on the previous state and action. In this article, we investigate whether constraining the state space by the Markov assumption, especially when the structure of the state space may be unknown, truly affords the highest reward. In simulation experiments conducted in the context of a dialogue system for interacting with a speech-enabled web browser, models under the Markov assumption did not perform as well as an alternative model which classifies the total reward with accumulating features. We discuss the implications of the study as well as its limitations. [PUBLICATION ABSTRACT]
653			\|a Markov analysis
653			\|a Decision making models
653			\|a Verbal communication
653			\|a Voice communication
653			\|a Speech
653			\|a Reinforcement
653			\|a Beliefs
653			\|a Computer simulation
653			\|a Optimization
653			\|a Markov chains
653			\|a Simulation
653			\|a Experiments
653			\|a Action
653			\|a Assumptions
653			\|a Models
653			\|a Robustness
653			\|a Alternative approaches
700	1		\|a David Maxwell Chickering
773	0		\|t Language Resources and Evaluation \|g vol. 40, no. 1 (Feb 2006), p. 47-66
786	0		\|d ProQuest \|t Arts & Humanities Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/214793002/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text \|u https://www.proquest.com/docview/214793002/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/214793002/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch