AI Glossary

Essential AI Dictionary

83 key terms to understand the conversation about artificial intelligence in 2026 — explained in plain English.

LLM

Large Language Model. A model trained on massive amounts of text to predict the next word. ChatGPT, Claude, and Gemini are LLMs.

Transformer

Neural network architecture introduced by Google in 2017 (in the 'Attention is all you need' paper). The foundation of practically all modern LLMs.

Fine-tuning

Retraining a pre-existing model with specific data to specialize it for a concrete task or style.

RAG

Retrieval-Augmented Generation. A technique where the model searches an external database for information before answering, reducing hallucinations.

Hallucination

When a model invents false information but presents it confidently. The central risk of LLMs.

Token

The minimum unit processed by an LLM. Roughly 0.75 words in English. Models charge and limit by tokens.

Embedding

Numerical representation of text in a vector space. Allows searching by meaning instead of literal matching.

Prompt Engineering

The art of formulating instructions to get the best output from an LLM. Includes role, context, constraints, and formatting.

Zero-shot

Getting a model to perform a task without giving it prior examples. The most surprising emergent ability of LLMs.

Few-shot

Giving the model 2-5 examples of the desired output within the prompt itself so it can clone the pattern.

RLHF

Reinforcement Learning from Human Feedback. A technique to align the model's behavior with human preferences. This is what made ChatGPT viable.

Diffusion Model

A model that learns to generate images starting from random noise and refining it step-by-step. The basis of Midjourney, DALL-E, and Stable Diffusion.

LoRA

Low-Rank Adaptation. An efficient technique to fine-tune massive models by adjusting only a small matrix of weights.

Quantization

Reducing the numerical precision of a model (from 32-bit to 8-bit or 4-bit) so it takes up less memory and runs faster.

Multimodal

A model that processes several types of data at once: text, image, audio, video. GPT-4o, Claude Sonnet 4.5, and Gemini are multimodal.

Agent

An AI system that executes multi-step tasks autonomously: it investigates, decides, calls APIs, and writes code. Cursor Composer and Devin are examples.

Chain of Thought

A prompting technique asking the model to 'think step by step' before answering. Drastically improves complex reasoning.

Temperature

A parameter that controls the randomness of responses (0 = deterministic, 1 = creative). 0.7 is the most common value.

Top-p

An alternative parameter to Temperature: the model only chooses among the words whose cumulative probability reaches p (e.g., 0.9 = the top 90% most probable).

Context Window

The maximum amount of tokens a model can 'remember' in a conversation. Claude Sonnet 4.5: 200K. Gemini Advanced: 2M.

Inference

The act of running a trained model to get a response. Distinct from training.

Benchmark

A standardized test to measure a model's capacity: MMLU (general knowledge), HumanEval (code), GSM8K (math), etc.

GGUF

File format for storing quantized models, optimized for local inference on CPU/GPU. Replaced the old GGML.

Tokenizer

The component that splits text into tokens. Different models use different tokenizers, so the same text has different lengths depending on the model.

Attention

A mechanism that allows the model to 'look' at other parts of the text when processing each word. The heart of the Transformer.

GPT

Generative Pretrained Transformer. OpenAI's family of models: GPT-3.5, GPT-4, GPT-4o, GPT-4.1.

BERT

Google's Transformer model (2018) optimized for understanding text (not generating). It is still the basis of Google Search in 2026.

Stable Diffusion

An open-source image generation model that can run locally. The open-source rival to Midjourney and DALL-E.

GAN

Generative Adversarial Network. Two competing networks: one generates, another discriminates. Precursor to diffusion models for imagery.

Neural Network

A computing system inspired by the brain: nodes ('neurons') connected by adjustable weights. The basis of all modern deep learning.

Base model

A model trained on massive data without task-specific fine-tuning. GPT-4o and Claude 3 are base models before system instructions are applied.

Reasoning model

An LLM designed to think step-by-step before answering. OpenAI o1 and Claude 3.7 Sonnet are examples. Slower but more accurate on complex problems.

Parameters

The internal numbers a model adjusts during training. More parameters generally means more capacity. GPT-4 is estimated to have ~1.8 trillion parameters.

Mixture of Experts (MoE)

Architecture that divides the model into specialized sub-models, activating only the relevant ones per query. GPT-4 and Mixtral use MoE to be more compute-efficient.

Open source model

A model whose code and weights are publicly available. Llama 3 (Meta) and Mistral are the most popular. You can run them locally without paying for an API.

Proprietary model

A model whose code or weights are not public. GPT-4o, Claude, and Gemini are proprietary. You access them only via API or the provider's interface.

SLM (Small Language Model)

A small language model designed to run on devices with limited memory. Microsoft's Phi-3 and Google's Gemma are SLMs capable of running on mobile devices.

Pre-training

The first phase of training where a model learns to predict text from massive datasets of internet text, books, and code.

Instruction tuning

Fine-tuning where the model is trained to follow instructions in a question-answer format. Converts a base model into a useful assistant.

DPO (Direct Preference Optimization)

A more efficient alternative to RLHF for aligning models with human preferences. Many modern open-source models use DPO instead of RLHF.

Synthetic data

Training data generated by AI rather than collected from humans. Used to train models when real data is scarce or expensive to collect.

API (Application Programming Interface)

Interface that lets your application send text to a model and receive responses. You pay per token consumed. OpenAI, Anthropic, and Google offer model APIs.

Latency

Time for the model to start responding. Critical in real-time applications. Groq stands out for ultra-low latency using specialized hardware (LPUs).

Cost per token

Price charged by the API per thousand tokens processed. GPT-4o Mini costs ~$0.15/million input tokens. Key for calculating your application's ROI.

Self-hosting

Running a model on your own infrastructure instead of using a provider's API. More control and privacy, but requires GPUs and technical expertise.

System prompt

Instructions you define for the model before the conversation starts. Sets its role, tone, and constraints. Invisible to end users in most apps.

In-context learning

The LLM's ability to learn from examples within the prompt itself without retraining. The more examples you include (few-shot), the better it follows the pattern.

Prompt injection

An attack where a user tries to override the system prompt to make the model ignore its instructions. The main security vulnerability in LLM-powered apps.

Jailbreak

A technique to bypass a model's safety restrictions using deceptive instructions. Providers continuously patch these attempts.

Grounding

The process of anchoring model responses to verifiable facts, typically via RAG or real-time web search. Dramatically reduces hallucinations.

Agentic AI

AI that doesn't just answer questions but takes autonomous actions: browsing the web, executing code, calling external APIs. A key trend in 2026.

Function calling

The model's ability to invoke external functions (search Google, query a database, send an email) instead of just generating text.

Orchestrator

A system that coordinates multiple agents or LLM calls in sequence. LangChain, LlamaIndex, and n8n act as orchestrators in AI workflows.

MCP (Model Context Protocol)

Anthropic's standard protocol for connecting LLMs to external tools and data sources securely. Adopted by Claude and many IDEs like Cursor.

Multi-agent system

A system where multiple AI agents collaborate to complete complex tasks divided into subtasks. A key trend in enterprise automation in 2026.

Vector database

A database that stores mathematical representations of text for semantic search. Pinecone, Weaviate, and Chroma are popular options. Essential for RAG systems.

Semantic search

Search that understands meaning rather than just matching exact keywords. A semantic system understands that "car" and "automobile" are equivalent.

Chunking

Splitting long documents into smaller fragments before storing them in a vector database. Chunk size directly affects RAG quality.

Evals (Evaluations)

Automated tests to measure model quality on specific tasks. The AI equivalent of unit tests in traditional software engineering.

Bias

Systematic tendency of the model to produce partial or stereotyped responses. Often comes from training data. Providers mitigate it with RLHF and DPO.

Text-to-image

Generating images from a text description (prompt). Midjourney, DALL-E 3, and Stable Diffusion are the leading models in 2026.

Negative prompt

In image generation, text describing what you do NOT want to appear. Helps remove visual artifacts and improve composition.

Inpainting

A technique for editing only a selected area of an existing image while keeping the rest intact. Available in DALL-E 3, Adobe Firefly, and Stable Diffusion.

Text-to-video

Generating video from a text description. Sora (OpenAI), Runway Gen-3, and Kling are the most advanced models in 2026.

TTS (Text-to-Speech)

Voice synthesis from text. ElevenLabs and Murf AI offer ultra-realistic voices. Widely used in podcasts, YouTube videos, and voice interfaces.

STT (Speech-to-Text)

Automatic transcription of audio to text. OpenAI's Whisper is the leading open-source model. Otter.ai and Fireflies.ai use STT to transcribe meetings.

Voice cloning

Creating a synthetic voice that sounds like a real person using only a few seconds of sample audio. ElevenLabs is the undisputed leader in this technology.

PII (Personally Identifiable Information)

Data that can identify a person (name, email, ID number). Critical not to send PII to LLM APIs without reviewing the provider's terms and compliance policies.

Prompt caching

A technique to reuse already-processed context, reducing costs and latency. Anthropic and OpenAI offer prompt caching with discounts of up to 90%.

AI wrapper

A product built on top of another model's API without proprietary model technology. Many 2023-2024 AI startups are wrappers over GPT-4 or Claude.

Foundation model

A large model trained on general data that serves as the base for multiple specific applications. GPT-4, Claude 3, and Gemini 1.5 are foundation models.

Frontier model

A model at the cutting edge of current AI capabilities. GPT-4o, Claude Opus, and Gemini Ultra compete to be the frontier model at any given time.

Compute

Processing power needed to train and run AI models. Measured in FLOPs. Access to compute (GPUs/TPUs) is the main bottleneck in the AI industry.

GPU (Graphics Processing Unit)

A chip designed to process thousands of operations in parallel. Essential for training and running LLMs. The NVIDIA H100 is the most used AI GPU in 2026.

Scaling laws

Empirical laws describing how models improve as data, parameters, and compute increase. The theoretical basis behind the race for larger models.

Emergent capabilities

Capabilities that appear in large models without being explicitly trained for them. Mathematical reasoning emerged this way in GPT-4.

Alignment

Research field seeking to ensure AI models act in accordance with human values and intentions. Anthropic, OpenAI, and DeepMind have dedicated teams.

Constitutional AI

Anthropic's technique for training safe models using a set of principles (a "constitution") instead of exhaustive human feedback. Used to train Claude.

Guardrails

Safety barriers programmed into a model to prevent harmful, false, or inappropriate responses. All commercial models have guardrails at varying restriction levels.

Extended context window

The ability to process very long documents in a single call. Gemini 1.5 Pro handles 1 million tokens. Enables analyzing entire books or large codebases.

Streaming

Response mode where the model sends tokens one by one in real time instead of waiting to generate the full response. Makes chatbot experiences feel faster and more fluid.

Temperature

Parameter that controls response randomness. Temperature 0 = deterministic and predictable. Temperature 1 = creative and varied. Key for tuning model behavior.

Multimodal mode

A model's ability to process and generate different data types: text, images, audio, and video. GPT-4o, Gemini 1.5, and Claude 3 are multimodal.