9: Introduction to Transformers

Date: 11th December 2024

💡 Transformers were initially introduced for the purpose of machine translation, but is now the most prevalent (State Of The Art) architecture used for virtually all deep learning tasks. Unlike traditional neural networks, Transformers rely on a mechanism called attention, which allows them to focus on relevant parts of the input sequence. Unlike RNNs this architecture takes in sequential input data in parallel.

Central to this model are the encoder-decoder blocks, where input data undergoes tokenization and is embedded into vectors with positional encodings to capture word order. This week, we will explore the attention mechanism, including multi-headed attention, the structure of encoder and decoder blocks, and the processes involved in training Transformers, such as tokenization, masking strategies, and managing computational costs. 💡

You can access our slides here: 💻 Tutorial 9 Slides