While recurrent neural networks (RNNs) are at the core of the leading approaches to language understanding tasks such as machine translation, the Transformer outperforms the recurrent model on academic English to German and English to French translation benchmarks.

Transformer is a novel neural network architecture based on a self-attention mechanism believed to be particularly well-suited for language understanding, developed by Google.

It requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training within an order of magnitude.

Google had detailed serious errors in the ways the tendency of translation models (both recurrent and convolutional) execute their work word by word, and its solution to it, in the Research blog post.

According to Google, the solution is what it called an attention mechanism built into its system, Transformer.



It matches each word against every other word in the sentence to see if any of them will affect the other in any key way, and as the translated sentence is being constructed, the attention mechanism compares each word.

Google’s approach is that it gives a logic view into the system, as Transformer gives each word a score in relation to every other word, so you can see what words it thinks are related.

And beyond computational performance and higher accuracy, another intriguing aspect of the Transformer is that it can visualize what other parts of a sentence the network attends to when processing or translating a given word.

What Google's Transformer brings to Machine Language Understanding

While recurrent neural networks (RNNs) are at the core of the leading approaches to language understanding tasks such as machine translation, the Transformer outperforms the recurrent model on academic English to German and English to French translation benchmarks.

Transformer is a novel neural network architecture based on a self-attention mechanism believed to be particularly well-suited for language understanding, developed by Google.

It requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training within an order of magnitude.

Google had detailed serious errors in the ways the tendency of translation models (both recurrent and convolutional) execute their work word by word, and its solution to it, in the Research blog post.

According to Google, the solution is what it called an attention mechanism built into its system, Transformer.



It matches each word against every other word in the sentence to see if any of them will affect the other in any key way, and as the translated sentence is being constructed, the attention mechanism compares each word.

Google’s approach is that it gives a logic view into the system, as Transformer gives each word a score in relation to every other word, so you can see what words it thinks are related.

And beyond computational performance and higher accuracy, another intriguing aspect of the Transformer is that it can visualize what other parts of a sentence the network attends to when processing or translating a given word.