nlp_intro

6. Text encoding with Transformers NNs

Explanations, formulas, visualisations:

 

Encoder-decoder framework (similar to earlier seq2seq)

 

Similar to word2vec: Training with self-supervision

 

Difference with word2vec: Better, contextual, “dynamic” (sub)word vectors

 

Reasons for the large number of parameters

 

Subword tokenization: similar to previous seq2seq, different from word2vec

(More details in the following lectures)

 

Generalised attention: different from previous seq2seq:

(More details in the following lectures)