NLP, Transformers and LLMs - An Overview 

In recent years, breakthroughs in artificial intelligence (AI) have transformed how machines understand and generate human language. This article offers a structured overview of the technologies that made these possible, namely NLP, Transformers, and LLMs, by summarising how they relate, differ, and build upon one another.

At the heart of this evolution lies the Natural Language Processing (NLP), a field that bridges linguistics and computer science. Among its most prominent developments are Transformers, a deep learning architecture that revolutionized language modeling, and Large Language Models (LLMs), such as GPT-4 and Claude. LLMs leverage Transformers to perform a wide range of language tasks with remarkable fluency. 

The relationship between NLP, LLMs, and Transformers can be understood as a hierarchy and technological progression
The relationship between NLP, LLMs, and Transformers can be understood as a hierarchy and technological progression

The relationship between NLP, LLMs, and Transformers can be understood as a hierarchy and technological progression:

 

What is NLP (Natural Language Processing)?

NLP is a subfield of AI and linguistics focused on the interaction between computers and human (natural) languages. By utilizing various computational operations and analyses, it enables the computers to understand, interpret, generate, and respond to human language in a meaningful way. 

Key Components of NLP

  1. Text Preprocessing: As the first step of processing, the raw input text is typically cleaned and normalized. It consists of:
    1. Tokenization: Splitting text into words or sentences
    2. Lowercasing, stemming, lemmatization
    3. Stop word removal: Removing words like “the”, “is”, “and”
    4. Part-of-speech tagging
  2. Syntactic Analysis: Then, the grammatical structure of sentences are analized via:
    1. Parsing: Analyzing sentence structure
    2. Dependency parsing: Finding relations between words
  3. Semantic Analysis: This step is about understanding meaning in the context by utilizing:
    1. Named Entity Recognition (NER): Identifying people, places, organizations
    2. Word sense disambiguation: Figuring out word meaning based on context
    3. Coreference resolution: Resolving “he”, “she”, “it”, etc. to the actual entity
  4. Discourse & Pragmatic Analysis is about understanding the language beyond individual sentences to handle sarcasm, idioms, or context from previous conversation turns

Common NLP Tasks

Challenges in NLP

As further reading on NLP, this Wikipedia article provides a nice overview of NLP, its history and use cases.

What is a Transformer?

A Transformer is a neural network (NN) architecture designed to handle sequential data. It was introduced in the famous 2017 research paper Attention is All You Need by Vaswani et al. Transformers are the foundation of all major modern language models like BERTGPT, and T5.

Transformers have revolutionized the NLP and many other AI fields thanks to their following aspects:

Simplied Transformers flow
Simplied Transformers flow

The video by 3Blue1Brown below explains the Transformers, the tech behind LLMs:

Key Concepts in Transformers

  1. Self-Attention Mechanism: allows the model to weigh the importance of each word in the input relative to every other word, regardless of distance.
  2. Positional Encoding: it is added to each token embedding to preserve the word order, since Transformers don’t process tokens sequentially.
  3. Multi-Head Attention: runs multiple attention mechanisms in parallel to learn different relationships between words simultaneously.
  4. Layer Normalization and Residual Connections: stabilizes and speeds up the model training.
  5. Feedforward Layers: after attention, each token passes through dense layers for further processing.

Transformer Architecture: Encoder & Decoder

  1. Encoder
    1. Takes the input (e.g., sentence) and converts it into a contextual representation.
    2. Used in models like BERT, RoBERTa, DistilBERT.
  2. Decoder
    1. Takes the encoded input and generates output (e.g., next word).
    2. Used in models like GPT, GPT-2/3/4.
  3. Encoder–Decoder
    1. Combines both parts: encoder processes input, decoder generates output.
    2. Used in models like T5, BART, Transformer for translation.

Example Transformer Use Cases

Further reading on Transformers: the respective Wikipedia Transformers article.

What are LLMs (Large Language Models)?

LLMs, a subclass of NLP models, are massive neural networks (NNs) trained on vast amounts of text data using the Transformer architecture to understand and generate human language. They can recognize patterns, comprehend context, and produce coherent and relevant responses. They owe their fame to their capability to handle many NLP tasks without task-specific training (called zero-shot/few-shot learning). LLMs are typically defined as:

LLM examples

Well-known examples are:

The video by 3Blue1Brown below explains the LLMs:

Further reading on LLMs: the respective Wikipedia Larger Language Model article.

^