nlp_intro

Tanja Samardžić, University of Geneva, Lecture notes, Autumn 2024

Introduction to NLP

(Traitement automatique du langage naturel - TALN)

These notes should be used as a guide for acquiring the most important notions and terminology in contemporary Natural Language Processing. Most of the notions mentioned in the notes are explained in the listed sources. A few visualisations are included in the notes for a better overview and intuitive understanding. This course includes a practical part too, which is managed on Moodle.

 

Online textbooks:

Blogs and other learning resources:

Topics

1. NLP tasks, data sets, benchmarks

2. Large language models (LLMs), Artificial Intelligence (AI) and Natural language processing (NLP), history of NLP

3. Evaluation, data splits

4. History of language modelling

5. Representing the meaning of words with word2vec

6. Language modelling with Transformers NNs

7. Attention in language modelling

8. Subword tokenization

9. Transfer learning: performing tasks with pre-trained models

10. Cross-lingual transfer and multilingual NLP

11. History of NN architectures in NLP: CNNs, LSTMs

12. What is knowledge about language?