1

Is there neural machine translation methods, that for one input sentence outputs multiple alternative output sentences in that target language. It is quite possible, that sentence in source language have multiple meanings and it is not desirable that neural network discards some of the meanings if there is no context for disambiguation provided. How multiple outputs can be acommodated into encode-decoder architecture, or different architecture is required?

I am aware of only one work https://arxiv.org/abs/1805.10844 (and one referen herein) but I am still digesting whether their network outputs multiple sentences or whether it just acommodates variations during training phase.

TomR
  • 823
  • 5
  • 15

1 Answers1

1

The LSTM seq2seq model typically used for language translation actually can output multiple variations, and does in many implementations.

The decoder stage outputs confidence/probabilities for each word, and it is possible to run the decoder multiple times, taking different samples on each run.

Not only is it possible, but this is actually done by translation engines. A technique called BEAM search is used to manage multiple translations in progress and pick an eventual one to display, based on whether the language model predicts an overall high or low probability for a sentence (this allows for sentences that start with an unusual initial word choice but that are a better choice in context of what is being translated). It would be a minor change to existing algorithms to display a selection of these instead.

Technically you don't need to sample randomly, and can just take the top N probabilities on each step, and recalculate each time on combined probabilities, culling partial sentences that have low probability in favour of continuing more of the high probability sentences.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • What was meant by "taking different samples"? Where is this source of variability? Decoder receives the same input and the same context (from the encoder).. – TomR Nov 15 '18 at 21:57
  • 1
    @TomR: The decoder does not output absolute word choices, it outputs word probabilities. Instead of taking the max probability as the only choice, sample according to the probabilities, using a random number generator. In addition the decoder accepts the previous word and state that it output as input for the next item in the sequence, so as soon as the sentence is started with a different word, this sets the translation on a (possibly very) different trajectory. – Neil Slater Nov 15 '18 at 21:58
  • 1
    @TomR: Technically you also don't need to sample, just take the top N probabilities on each step, and recalculate each time on combined probabilities, culling partial sentences that have low probability in favour of continuing more of the high probability sentences. That works just fine for BEAM search – Neil Slater Nov 15 '18 at 22:03
  • Many thanks. But is there some published work about such approach or is it folklore shared only among experts (I am novice in this field)? Maybe there is some good keywords. Google gives lot of answers for "multiple output neural network", but not for NMT. – TomR Nov 15 '18 at 22:06
  • 1
    @TomR: I'm pretty sure there are published works. However, I got my understanding of this from Andrew Ng's course Sequence Models. This is a relevant lecture https://www.coursera.org/learn/nlp-sequence-models/lecture/v2pRn/picking-the-most-likely-sentence – Neil Slater Nov 15 '18 at 22:08
  • I know I am very late to this party, but the ideal search terms are "beam search nmt". One relevant published work is https://www.aclweb.org/anthology/W17-3207.pdf - it also includes references to the works that introduced beam search for sequence models in the first place. – Mathias Müller Jan 29 '20 at 13:59