Award Abstract # 0964102
RI: Medium: Collaborative Research: Semi-Supervised Discriminative Training of Language Models

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: OREGON HEALTH & SCIENCE UNIVERSITY
Initial Amendment Date: June 9, 2010
Latest Amendment Date: March 18, 2014
Award Number: 0964102
Award Instrument: Continuing Grant
Program Manager: Tatiana Korelsky
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2010
End Date: May 31, 2015 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $519,050.00
Funds Obligated to Date: FY 2010 = $229,618.00
FY 2011 = $194,782.00

FY 2012 = $94,650.00
History of Investigator:
  • Alexander Kain (Principal Investigator)
    kaina@ohsu.edu
  • Brian Roark (Former Principal Investigator)
  • Izhak Shafran (Former Principal Investigator)
  • Izhak Shafran (Former Co-Principal Investigator)
  • Richard Sproat (Former Co-Principal Investigator)
Recipient Sponsored Research Office: Oregon Health & Science University
3181 SW SAM JACKSON PARK RD
PORTLAND
OR  US  97239-3011
(503)494-7784
Sponsor Congressional District: 01
Primary Place of Performance: Oregon Health & Science University
3181 SW SAM JACKSON PARK RD
PORTLAND
OR  US  97239-3011
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): NPSNT86JKN51
Parent UEI:
NSF Program(s): International Research Collab,
Robust Intelligence
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
01001112DB NSF RESEARCH & RELATED ACTIVIT

01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 5940, 7495, 7924
Program Element Code(s): 729800, 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project is conducting fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT).

A language model (LM) is conventionally optimized, using text in the target language, to assign high probability to well-formed sentences. This method has a fundamental shortcoming: the optimization does not explicitly target the kinds of distinctions necessary to accomplish the task at hand, such as discriminating (for ASR) between different words that are acoustically confusable or (for MT) between different target-language words that express the multiple meanings of a polysemous source-language word.

Discriminative optimization of the LM, which would overcome this shortcoming, requires large quantities of paired input-output sequences: speech and its reference transcription for ASR or source-language (e.g. Chinese) sentences and their translations into the target language (say, English) for MT. Such resources are expensive, and limit the efficacy of discriminative training methods.

In a radical departure from convention, this project is investigating discriminative training using easily available, *unpaired* input and output sequences: un-transcribed speech or monolingual source-language text and unpaired target-language text. Two key ideas are being pursued: (i) unlabeled input sequences (e.g. speech or Chinese text) are processed to learn likely confusions encountered by the ASR or MT system; (ii) unpaired output sequences (English text) are leveraged to discriminate between these well-formed sentences from the (supposed) ill-formed sentences the system could potentially confuse them with.

This self-supervised discriminative training, if successful, will advance machine intelligence in fundamental ways that impact many other applications.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A. �elebi, H. Sak, E. Dikici, M. Saraçlar, M. Lehr, E.T. Prud'hommeaux, P. Xu, N. Glenn, D. Karakos, S. Khudanpur, B. Roark, K. Sagae, I. Shafran, D. Bikel, C. Callison-Burch, Y. Cao, K. Hall, E. Hasler, P. Koehn, A. Lopez, M. Post, D. Riley "Semi-supervised discriminative language modeling for Turkish ASR" Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2012 , p.5025
Barry S. Oken, Umut Orhan, Brian Roark, Deniz Erdogmus, Andrew Fowler, Aimee Mooney, Betts Peters, Meghan Miller, and Melanie B. Fried-Oken "Brain-computer interface with language model-Electroencephalography fusion for locked-in syndrome" Neurorehabilitation and Neural Repair , v.28 , 2014
Brian Roark, Russ Beckley, Chris Gibbons and Melanie Fried-Oken "Huffman scanning: using language models within fixed-grid keyboard emulation" Computer Speech and Language , v.27 , 2013
Izhak Shafran, Richard Sproat, Mahsa Yarmohammadi and Brian Roark "Efficient Determinization of Tagged Word Lattices using Categorial and Lexicographic Semirings" Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) , 2011 , p.283
K. Sagae, M. Lehr, E.T. Prud'hommeaux, P. Xu, N. Glenn, D. Karakos, S. Khudanpur, B. Roark, M. Saraçlar, I. Shafran, D. Bikel, C. Callison-Burch, Y. Cao, K. Hall, E. Hasler, P. Koehn, A. Lopez, M. Post, D. Riley "Hallucinated n-best lists for discriminative language modeling" Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2012 , p.5001
P. Xu, S. Khudanpur, M. Lehr, E.T. Prud'hommeaux, N. Glenn, D. Karakos, B. Roark, K. Sagae, M. Saraçlar, I. Shafran, D. Bikel, C. Callison-Burch, Y. Cao, K. Hall, E. Hasler, P. Koehn, A. Lopez, M. Post, D. Riley "Continuous space discriminative language modeling" Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2012 , p.2129
Zhifei Li, Ziyuan Wang, Jason Eisner, Sanjeev Khudanpur and Brian Roark "Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation" Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2011 , p.920

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project was focused on new methods for learning statistical models that can discriminate between sentences that (a) were likely to be produced by a person and (b) were unlikely to be produced by a person.  These sorts of language models are widely used in various common human language processing systems, such as automatic speech recognition, machine translation and optical character recognition.  The models help choose between possible alternative system outputs by biasing the system towards sentences that were likely to be produced by a person.  Certain advanced methods for training discriminative language models require large amounts of data that match human produced sentences with other alternative sentences that the system outputs incorrectly in that case.  In many common scenarios, large amounts of such data is simply unavailable.  In this project, we addressed training in scenarios where alternative system outputs either were not readily available, or the correct sentence from among the outputs is unknown.  In these scenarios, new methods for discriminatively training statistical language models were required, and the central intellectual merit of the project resided in creating such methods and validating them in real language processing systems.  Given the growing importance of speech recognition and automatic translation as applications used by millions of people in mobile computing interactions, improvements in language model training methods will broadly impact the quality of human computer interaction.  In addition, a broader impact of the project is the training received by several Ph.D. students who worked this project, some of whom have already taken this expertise with them to excellent positions in the field.
New language model training methods were designed and applied to both machine translation and automatic speech recognition tasks, and the change in system accuracy was measured relative to baseline approaches.  In both applications, one of the standard approaches investigated was to train from human produced language samples (i.e., text in the target language) with no available system outputs, by simulating alternative system outputs for each given sentence.  For machine translation, this simulation can be performed by first translating the target language string (e.g., in English) to the source language (e.g., Chinese) then back to the target language again (so-called 'round trip' translation).  In such a way, for a given English sentence, there are now a set of possible alternative system outputs.  We found that augmenting fully supervised training data (i.e., sentence with actual system outputs) with this semi-supervised 'simulated' training data yielded improvements in a Chinese-English translation task.  Interestingly, translation-based approaches also were useful for discriminative language modeling for speech recognition.  In this case, we learned to automatically 'translate' from a given real human sentence to simulated system outputs.  Using these machine-translation-derived methods, we were able to train language models that achieved significant speech recognition accuracy improvements over strong baseline systems.  We found that simulation methods based on word or phrase correspondences yielded models that outperformed those trained with simulation methods based on phone-sequence correspondences in an English Broadcast News task.  The methods were shown to yield gains in diverse tasks: similar methods using morpheme sequences instead of word sequences showed strong accuracy improvements in Turkish speech recognition.  In addition, we found that deriving correspondences for such simulation systems from string pairs with no knowledge of which string was actually spoken still yielded useful semi-supervised trainin...

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page