Fundamentals of Large Language Models (LLMs)

September 08, 2025

1 Introduction

Large Language Models (LLMs) are at the core of today’s AI revolution. They power chatbots, search engines, writing assistants, coding tools, and even multimodal systems that can interpret images, audio, and video.

This article consolidates all essential LLM fundamentals — giving you a solid foundation before diving into deeper posts.

1. What Are Large Language Models (LLMs)?

LLMs are advanced AI models trained on massive amounts of text data. Their goal is to understand, generate, and reason with natural language in a human-like way.

Modern LLMs can:

Answer questions
Summarise long documents
Generate content
Write and debug code
Perform reasoning and analysis
Interpret images (multimodal models)
Assist in data tasks

They are not just text generators — they are general-purpose language intelligence systems.

2. NLP vs LLMs vs Generative AI

Comparison Table

Area	NLP	LLM	Gen AI
Definition	Rules + classical ML + deep learning	Neural models trained on massive text data	AI that generates new content
Scope	Field of computational linguistics	Language understanding + generation	Images, audio, video, text
Examples	Tokenization, POS tagging	GPT-4, Claude, LLaMA	Midjourney, DALL·E, LLMs

Key takeaway:

All LLMs are part of Generative AI, but not all Generative AI systems are LLMs.

3. How LLMs Work (High-Level Overview)?

LLMs generate language by predicting the next token based on previous tokens. They rely on the Transformer architecture, which allows them to understand complex relationships between words using Attention.

Core Concepts Behind LLMs

A. Tokenization

Text is broken into tokens — words, subwords, or characters. LLMs operate on these tokens, not raw sentences..

B. Embeddings

Each token is converted into a numerical vector that represents meaning. Words with similar meanings have similar embeddings.

C. Transformer Blocks

The central building units of LLMs. Each block processes tokens using:

Multi-head attention
Feed-forward networks
Layer normalization
Residual connections

These allow the model to understand context and relationships efficiently.

D. Attention Mechanism (Core Concept)

Attention lets the model decide which words matter most. Example: In the sentence “The animal didn’t cross the road because it was too tired,” attention helps identify that it refers to the animal, not the road.

4. Types of LLMs (Introduction Level)

There are several categories of LLMs, each designed for different purposes.

A. Decoder-Only Models (Most common today)

Predict text one token at a time.

Examples: GPT-3/4/5, LLaMA, Mistral, Claude

Best for: Chatbots, reasoning, content generation

B. Encoder-Only Models (Understanding focused)

Extract meaning but don’t generate long text.

Examples: BERT, RoBERTa

Best for: Search, classification, sentiment analysis

C. Encoder-Decoder Models

Input processed by encoder → output generated by decoder..

Examples: T5, BART

Best for: Translation, summarisation, text correction

D. Multimodal LLMss

Understand text + images (and sometimes audio/video).

Examples: GPT-4o, Gemini, Claude 3

Best for: Image reasoning, document analysis

E. Domain-Specific LLMs

Fine-tuned for specific industries.

Examples: Med-PaLM (medical), StarCoder (coding)

F. Open-Source vs Proprietary Models

Type	Meaning	Examples
Open Source	Weights available to run locally	LLaMA, Mistral
Proprietary	API access only	GPT-4/5, Claude, Gemini

This classification gives readers the conceptual map before going deeper in future articles.

5. Training the LLM: How Knowledge Is Learned

LLMs learn from text by identifying patterns, structure, semantics, and reasoning relationships. The training pipeline usually consists of:

A. Pretraining (General Knowledge)

Models learn grammar, facts, reasoning patterns from huge text datasets.

B. Supervised Fine-Tuning (Task Training)

Models learn via labeled examples (e.g., question → answer).

C. Instruction Tuning

Models learn to follow human-style instructions.

D. RLHF (Reinforcement Learning from Human Feedback)

Human raters choose the best responses.

The model learns preferred behaviour, style, and safety.

6. What Can LLMs Do? (Capabilities)

Modern models can:

Generate text
Summarise documents
Answer complex questions
Translate languages
Analyse sentiment
Solve reasoning problems
Write/debug code
Understand images (multimodal)

These capabilities expand continuously as models evolve.

7. Limitations of LLMs

Despite their power, LLMs have limitations:

Hallucinations: confident wrong answers
Bias: inherited from training data
Sensitivity to prompts
Limited memory
Slow inference on long contexts

Understanding limitations is crucial for proper usage.

8. Transition to Prompt Engineering

Now that you understand the fundamentals of LLMs, the next step is learning how to communicate with them effectively.

Better prompts lead to better results — regardless of how powerful the model is.

--Infinite Ripples | HK

Next Topic

Complete Guide to Prompt Engineering: Myths, Types, Mistakes, and Best Practices

Search This Blog

Infinite Ripples | Generative AI, Python & AI Programming