nlp_intro

10. Transfer learning: Performing tasks with decoder-type pre-trained models

Explanations and visualisations:

 

decoding

Source: https://d2l.ai/chapter_recurrent-modern/beam-search.html

 

BERT (Bidirectional Encoder Representations from Transformers ) vs. GPT (Generative Pre-trained Transformer)

Decoding vs. prediction

GPT after pre-training

Pre-training, training, testing (summary)