Natural Language Speech and Audio Processing

Domaine: Natural Language Speech and Audio Processing
Domain - extra: psycholinguistics, machine learning, corpus linguistics
Année: 2010
Starting: automn 2010
État: Open
Sujet: Automatic speech transcription error recovery using multiple knowledge sources
Thesis advisor: ADDA-DECKER Martine
Co-advisors: LAMEL Lori LIMSI/CNRS
Ioana Vasilescu LIMSI/CNRS
Laboratory: EXT
Collaborations: RWTH Aachen University Germany,
KIT Karlsruhe Germany,
LPP/CNRS Univ. Paris 3
Abstract: Over the past decade, it has been firmly established that human listeners still significantly outperform machines on speech transcription tasks. Indeed, human native listeners generally do a very good job in handling many aspects of variation that are proper to speech, such as pronunciation variants, disfluencies, ungrammatical sentences, accents, noise and so forth. These observations are particularly true when large surrounding contexts (long sentences) are available. However, ASR systems generally take their transcription decisions on relatively limited contexts (several words) and their handling of variation in speech still remains a big challenge for current automatic speech recognition (ASR) systems.
ASR being an enabling technology for a large variety of advanced potential applications, such as multi-media information access or speech-to-speech translation, the impact of ASR errors on their performances will also be investigated.
Context: The handling of variation in speech still remains a big challenge for current automatic speech recognition (ASR) systems. In particular, the handling of casual interactive speech often results in high word error rates, which ask for specific error recovery strategies. The rich experimental environment of the Franco-German Quaero project (with annual ASR evaluations in multiple languages), provides a unique testbed for a systematic study of ASR errors. The proposed parallel between human and machine errors is then highly innovative and may push both our fundamental knowledge about human speech processing as well as basic techniques for automatic speech processing and error recovery.
Objectives: The aim of the present proposal is to identify current obstacles that affect ASR performance, to propose a sound ASR error typology and to benchmark human vs ASR performances according to this typology, to design innovative mechanisms for error recovery, as well as to explore new solutions to improved spoken language modeling.
Work program: ASR errors need to be investigated according to at least three axes:
(i) perceptual experiments on selected materials to benchmark human performances.
(ii) proper names which produce errors which are further harmful to further processings, such as information access, translation or question answering (factors: frequency of occurrence, pronunciation variants, repetitions).
(iii) reduced pronunciations (modeling options: specific acoustic models, pronunciation dictionary).
Apply and evaluate the impact of different knowledge sources (named entities, POS, prosody, pronunciation variants, speaking rate, frequency).
Extra information
Prerequisite
Détails: SujetTheseIV.pdf
Expected funding: Research contract
Status of funding: Expected
Candidates: YAHIA Dahbia
Utilisateur: martine.adda-decker
Créé: Mardi 29 juin 2010 16:09:10 CEST
dernière modif.: Mardi 29 juin 2010 16:09:10 CEST

Fichiers joints

	filename	créé	hits	filesize
	SujetTheseIV.pdf	29 Jun 2010 16:09	1294	36.38 Kb

Connexion

Ecole Doctorale Informatique Paris-Sud

Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr