Anuvaad: Spoken Language Machine Translation

with Stochastic Finite-State Transducers


In  project Anuvaad (means "translation" in Sanskrit), we are working to provide multilingual text and speech interfaces to existing applications. These applications range from human-machine dialog systems (eg. information access systems) to human-human dialog systems (eg. instant messaging).  Our focus in project Anuvaad is to (a) rapidly multilingual enable existing applications without having to craft the application for each language and (b)  explore techniques for rapid development of trainable and scalable automatic language translation systems.

Multilingual Enablement of an Existing Application

A crucial component in the multilingual enablement of an existing application is a "Transnizer", a tightly integrated speech recognition and  translation system.  A prototype system has been developed for Spanish-to-English translation in the context of "How May I Help You" (a customer care application). The bilingual translation system enables a speaker to converse naturally in Spanish using the existing English language semantic and dialog components of the HMIHY prototype system. We are working to extend the same paradigm to more languages and other speech-enabled services.

A transnizer is a stochastic finite-state transducer that integrates the language model of a speech recognizer and the translation model of a speech translator into one single finite-state transducer.  Thus a transnizer directly maps source language phones into target language word sequences. A transnizer can be used in place of a language model of a speech recognizer to obtain a speech translation system. Speech-to-speech translation is achieved in one step, contrary to the more popular two-step approach of using a translation component as a backend that translates the output of a speech recognizer.

The challege for building a transnizer is to model, using finite-state transducers, the two sub problems of language translation (a) lexical selection: selecting the appropriate target word/phrase for a given source language word/phrase (b) lexical reordering: rearranging the selected target words/phrases into a well-formed target language utterance.  The papers below discuss the construction of these finite-state transducers.

Rapid Development of a Machine Translation System

The present day machine translation systems have been built over a period of decades. These systems involve manual creation of rules which is both tedious and time-consuming. In recent times, corpus-based, statistical translation has emerged as an alternate paradigm to building machine translation systems. In this paradigm, rules for translation are automatically learned from a corpus consisting of source and target language sentence pairs (parallel corpus). Over the past few years, translation research at AT&T has focused on building different models in the statistical translation paradigm. We have demonstrated translation models for English-Spanish and English-Japanese and have shown them to perform well for spontaneous speech in domain-specific applications, such as "How May I Help You?".

Although the statistical translation paradigm has significantly reduced the  time to build a machine translation system, it relies heavily on the availability of a parallel corpus. In project Anuvvad, we address the issue of inducing and deriving parallel corpora with a view to further decrease the time to build a machine translation system. We induce multiple instances of parallel corpora from a monolingual corpus, by using off-the-shelf machine translation systems. Each instance of the parallel corpus is viewed as a canditate corpus translation. We then use algorithms to combine these translation hypotheses into the "consensus" translation. We have investigated various techniques that arrive at consensus translations using multi-sequence aligment of the hypotheses translations.  We have shown that our techniques outperform off-the-shelf translation engines on two applications, a conference registration system and a multilingual instant-messenger system (video below).


In the videos below, you can see Anuvaad in the context of an operator-customer spoken dialog system. You will see a spoken dialog system that  recognizes and responds to speech utterances from English-speaking customers. With Anuvaad as its front-end, the very same system (without any changes to the back-end) recognizes, translates and  responds to Spanish-speaking customers.
 Long Version (about 5 minutes)
Short Version (abount 2 minutes)


In the video below, you will see translations provided by a statistical translation model trained on consensus translations obtained using multistring alignment on outputs of multiple translation engines. The data originally involved chat conversations which was not restricted to any particular domain. The conversation in the video is between a traveler wishing to travel to a place where he does not speak the local language and a travel agent. (You will need to download and install this codec to play this video.)
 El Hubbub (about 3 minutes)


In 2003, ANUVAAD machine translation technology was used in Mandolin, a prototype Multilingual Speech/Text-based Human-Human Interaction system that was developed by AT&T Government Solutions and demonstrated successfully at the Joint Warrior Interoperability Demonstration (JWID-2003).
 Find out about Mandolin


In 2004, we have scaled up ANUVAAD machine translation technology to build translation models on very large corpora (millions of sentence pairs and hundreds of millions of words). We have used ANUVAAD to translate in real time with low latency, the result of large vocabulary speech recognition of broadcast news sources. In the following video, you will see the result of English speech recognition, chunk-by-chunk English-to-Spanish machine translation and sentence-level English-to-Spanish machine translation of a CSPAN news item that was broadcast on January 24, 2007 (older version from January 28, 2005).
 Latest (2009) Real-time Broadcast News Machine Translation with Named Entity and Wikipedia access
 Real-time Broadcast News Machine Translation
Older version
 Real-time Broadcast News Machine Translation (about 12.5 minutes)


  • Srinivas Bangalore, Vanessa Murdock and Giuseppe Riccadi,  Bootstrapping Bilingual Data using Consensus Translation for a  Multilingual Instant Messaging SystemInternational Conference in Computational Linguistics  (COLING 2002), Taipei, Taiwan, 2002.
  • ``Srinivas Bangalore, German Bordel and Giuseppe Riccardi,  Computing Consensus Translation from Multiple Machine Translation Systems, ASRU 2001, Italy, December 2001
  • ``Srinivas Bangalore and Giuseppe Riccardi, A Finite-State Approach to Machine Translation'', North American ACL 2001 (NAACL-2001) , Pittsburgh, May 2001.
  • ``Srinivas Bangalore and Giuseppe Riccardi, Finite-State Models for Lexical Reordering in Spoken Language Translation'', International Conference on Speech and Language Processing 2000 (ICSLP-2000) , Beijing, China, October 2000.
  • ``Srinivas Bangalore and Giuseppe Riccardi, Stochastic Finite-State models for Spoken Language Machine Translation'', Workshop on Embedded Machine Translation Systems , Seattle, April 2000.



  •  Slides from presentation at  Workshop on Embedded Machine Translation Systems, Seattle, April, 2000.