nlp_intro

11. Cross-lingual transfer and multilingual NLP

Explanations and visualisations

 

finetune

continuepretrain

test

 

Cross-lingual transfer

 

Multilingual data sets

Only text

 

Parallel data (for machine translation)

 

Annotated for text parsing

 

Annotated for semantic NLP tasks (sentiment, similarity, inference, question-answering, …)

Many multilingual data sets are created from a selection of data taken from Common Crawl.

 

Multilingual pre-trained models

BERT-type

GPT-type

Full Transformers

Multiple encoder-decoder (not transformers)

Other pre-trained models are typically trained for a single language or a group of languages (e.g. Indic BERT, AraBERT, BERTić)

 

Language similarity and transfer

 

mentions

 

Language vectors

 

Benefits of multilingual NLP