Recurrent Neural Networks/Attention is All you Need

This is a page for notes on Vaswani et al. 2017 Attention is All You Need.

Readings

 * 1) arXiv.org: Attention Is All You Need

Key Questions

 * 1) What is an encoder-decoder recurrent neural network (RNN)?
 * 2) What improvements does the attention mechanism provide over RNN encoder-decoder?
 * 3) How does the model architecture in Vaswani et al. 2017 differ from the existing applications of attention in natural language processing?
 * 4) What is the difference between self-attention and regular attention and what are the benefits of the former as compared to the later?
 * 5) What is the difference between multi-head attention and regular attention and what are the benefits of the former as compared to the later?