Essential AI Dictionary
83 key terms to understand the conversation about artificial intelligence in 2026 — explained in plain English.
LLM
Large Language Model. A model trained on massive amounts of text to predict the next word. ChatGPT, Claude, and Gemini are LLMs.
Transformer
Neural network architecture introduced by Google in 2017 (in the 'Attention is all you need' paper). The foundation of practically all modern LLMs.
Fine-tuning
Retraining a pre-existing model with specific data to specialize it for a concrete task or style.
RAG
Retrieval-Augmented Generation. A technique where the model searches an external database for information before answering, reducing hallucinations.
Hallucination
When a model invents false information but presents it confidently. The central risk of LLMs.
Token
The minimum unit processed by an LLM. Roughly 0.75 words in English. Models charge and limit by tokens.
Embedding
Numerical representation of text in a vector space. Allows searching by meaning instead of literal matching.
Prompt Engineering
The art of formulating instructions to get the best output from an LLM. Includes role, context, constraints, and formatting.
Zero-shot
Getting a model to perform a task without giving it prior examples. The most surprising emergent ability of LLMs.
Few-shot
Giving the model 2-5 examples of the desired output within the prompt itself so it can clone the pattern.
RLHF
Reinforcement Learning from Human Feedback. A technique to align the model's behavior with human preferences. This is what made ChatGPT viable.
Diffusion Model
A model that learns to generate images starting from random noise and refining it step-by-step. The basis of Midjourney, DALL-E, and Stable Diffusion.
LoRA
Low-Rank Adaptation. An efficient technique to fine-tune massive models by adjusting only a small matrix of weights.
Quantization
Reducing the numerical precision of a model (from 32-bit to 8-bit or 4-bit) so it takes up less memory and runs faster.
Multimodal
A model that processes several types of data at once: text, image, audio, video. GPT-4o, Claude Sonnet 4.5, and Gemini are multimodal.
Agent
An AI system that executes multi-step tasks autonomously: it investigates, decides, calls APIs, and writes code. Cursor Composer and Devin are examples.
Chain of Thought
A prompting technique asking the model to 'think step by step' before answering. Drastically improves complex reasoning.
Temperature
A parameter that controls the randomness of responses (0 = deterministic, 1 = creative). 0.7 is the most common value.
Top-p
An alternative parameter to Temperature: the model only chooses among the words whose cumulative probability reaches p (e.g., 0.9 = the top 90% most probable).
Context Window
The maximum amount of tokens a model can 'remember' in a conversation. Claude Sonnet 4.5: 200K. Gemini Advanced: 2M.
Inference
The act of running a trained model to get a response. Distinct from training.
Benchmark
A standardized test to measure a model's capacity: MMLU (general knowledge), HumanEval (code), GSM8K (math), etc.
GGUF
File format for storing quantized models, optimized for local inference on CPU/GPU. Replaced the old GGML.
Tokenizer
The component that splits text into tokens. Different models use different tokenizers, so the same text has different lengths depending on the model.
Attention
A mechanism that allows the model to 'look' at other parts of the text when processing each word. The heart of the Transformer.
GPT
Generative Pretrained Transformer. OpenAI's family of models: GPT-3.5, GPT-4, GPT-4o, GPT-4.1.
BERT
Google's Transformer model (2018) optimized for understanding text (not generating). It is still the basis of Google Search in 2026.
Stable Diffusion
An open-source image generation model that can run locally. The open-source rival to Midjourney and DALL-E.
GAN
Generative Adversarial Network. Two competing networks: one generates, another discriminates. Precursor to diffusion models for imagery.
Neural Network
A computing system inspired by the brain: nodes ('neurons') connected by adjustable weights. The basis of all modern deep learning.
Base model
A model trained on massive data without task-specific fine-tuning. GPT-4o and Claude 3 are base models before system instructions are applied.
Reasoning model
An LLM designed to think step-by-step before answering. OpenAI o1 and Claude 3.7 Sonnet are examples. Slower but more accurate on complex problems.
Parameters
The internal numbers a model adjusts during training. More parameters generally means more capacity. GPT-4 is estimated to have ~1.8 trillion parameters.
Mixture of Experts (MoE)
Architecture that divides the model into specialized sub-models, activating only the relevant ones per query. GPT-4 and Mixtral use MoE to be more compute-efficient.
Open source model
A model whose code and weights are publicly available. Llama 3 (Meta) and Mistral are the most popular. You can run them locally without paying for an API.
Proprietary model
A model whose code or weights are not public. GPT-4o, Claude, and Gemini are proprietary. You access them only via API or the provider's interface.
SLM (Small Language Model)
A small language model designed to run on devices with limited memory. Microsoft's Phi-3 and Google's Gemma are SLMs capable of running on mobile devices.
Pre-training
The first phase of training where a model learns to predict text from massive datasets of internet text, books, and code.
Instruction tuning
Fine-tuning where the model is trained to follow instructions in a question-answer format. Converts a base model into a useful assistant.
DPO (Direct Preference Optimization)
A more efficient alternative to RLHF for aligning models with human preferences. Many modern open-source models use DPO instead of RLHF.
Synthetic data
Training data generated by AI rather than collected from humans. Used to train models when real data is scarce or expensive to collect.
API (Application Programming Interface)
Interface that lets your application send text to a model and receive responses. You pay per token consumed. OpenAI, Anthropic, and Google offer model APIs.
Latency
Time for the model to start responding. Critical in real-time applications. Groq stands out for ultra-low latency using specialized hardware (LPUs).
Cost per token
Price charged by the API per thousand tokens processed. GPT-4o Mini costs ~$0.15/million input tokens. Key for calculating your application's ROI.
Self-hosting
Running a model on your own infrastructure instead of using a provider's API. More control and privacy, but requires GPUs and technical expertise.
System prompt
Instructions you define for the model before the conversation starts. Sets its role, tone, and constraints. Invisible to end users in most apps.
In-context learning
The LLM's ability to learn from examples within the prompt itself without retraining. The more examples you include (few-shot), the better it follows the pattern.
Prompt injection
An attack where a user tries to override the system prompt to make the model ignore its instructions. The main security vulnerability in LLM-powered apps.
Jailbreak
A technique to bypass a model's safety restrictions using deceptive instructions. Providers continuously patch these attempts.
Grounding
The process of anchoring model responses to verifiable facts, typically via RAG or real-time web search. Dramatically reduces hallucinations.
Agentic AI
AI that doesn't just answer questions but takes autonomous actions: browsing the web, executing code, calling external APIs. A key trend in 2026.
Function calling
The model's ability to invoke external functions (search Google, query a database, send an email) instead of just generating text.
Orchestrator
A system that coordinates multiple agents or LLM calls in sequence. LangChain, LlamaIndex, and n8n act as orchestrators in AI workflows.
MCP (Model Context Protocol)
Anthropic's standard protocol for connecting LLMs to external tools and data sources securely. Adopted by Claude and many IDEs like Cursor.
Multi-agent system
A system where multiple AI agents collaborate to complete complex tasks divided into subtasks. A key trend in enterprise automation in 2026.
Vector database
A database that stores mathematical representations of text for semantic search. Pinecone, Weaviate, and Chroma are popular options. Essential for RAG systems.
Semantic search
Search that understands meaning rather than just matching exact keywords. A semantic system understands that "car" and "automobile" are equivalent.
Chunking
Splitting long documents into smaller fragments before storing them in a vector database. Chunk size directly affects RAG quality.
Evals (Evaluations)
Automated tests to measure model quality on specific tasks. The AI equivalent of unit tests in traditional software engineering.
Bias
Systematic tendency of the model to produce partial or stereotyped responses. Often comes from training data. Providers mitigate it with RLHF and DPO.
Text-to-image
Generating images from a text description (prompt). Midjourney, DALL-E 3, and Stable Diffusion are the leading models in 2026.
Negative prompt
In image generation, text describing what you do NOT want to appear. Helps remove visual artifacts and improve composition.
Inpainting
A technique for editing only a selected area of an existing image while keeping the rest intact. Available in DALL-E 3, Adobe Firefly, and Stable Diffusion.
Text-to-video
Generating video from a text description. Sora (OpenAI), Runway Gen-3, and Kling are the most advanced models in 2026.
TTS (Text-to-Speech)
Voice synthesis from text. ElevenLabs and Murf AI offer ultra-realistic voices. Widely used in podcasts, YouTube videos, and voice interfaces.
STT (Speech-to-Text)
Automatic transcription of audio to text. OpenAI's Whisper is the leading open-source model. Otter.ai and Fireflies.ai use STT to transcribe meetings.
Voice cloning
Creating a synthetic voice that sounds like a real person using only a few seconds of sample audio. ElevenLabs is the undisputed leader in this technology.
PII (Personally Identifiable Information)
Data that can identify a person (name, email, ID number). Critical not to send PII to LLM APIs without reviewing the provider's terms and compliance policies.
Prompt caching
A technique to reuse already-processed context, reducing costs and latency. Anthropic and OpenAI offer prompt caching with discounts of up to 90%.
AI wrapper
A product built on top of another model's API without proprietary model technology. Many 2023-2024 AI startups are wrappers over GPT-4 or Claude.
Foundation model
A large model trained on general data that serves as the base for multiple specific applications. GPT-4, Claude 3, and Gemini 1.5 are foundation models.
Frontier model
A model at the cutting edge of current AI capabilities. GPT-4o, Claude Opus, and Gemini Ultra compete to be the frontier model at any given time.
Compute
Processing power needed to train and run AI models. Measured in FLOPs. Access to compute (GPUs/TPUs) is the main bottleneck in the AI industry.
GPU (Graphics Processing Unit)
A chip designed to process thousands of operations in parallel. Essential for training and running LLMs. The NVIDIA H100 is the most used AI GPU in 2026.
Scaling laws
Empirical laws describing how models improve as data, parameters, and compute increase. The theoretical basis behind the race for larger models.
Emergent capabilities
Capabilities that appear in large models without being explicitly trained for them. Mathematical reasoning emerged this way in GPT-4.
Alignment
Research field seeking to ensure AI models act in accordance with human values and intentions. Anthropic, OpenAI, and DeepMind have dedicated teams.
Constitutional AI
Anthropic's technique for training safe models using a set of principles (a "constitution") instead of exhaustive human feedback. Used to train Claude.
Guardrails
Safety barriers programmed into a model to prevent harmful, false, or inappropriate responses. All commercial models have guardrails at varying restriction levels.
Extended context window
The ability to process very long documents in a single call. Gemini 1.5 Pro handles 1 million tokens. Enables analyzing entire books or large codebases.
Streaming
Response mode where the model sends tokens one by one in real time instead of waiting to generate the full response. Makes chatbot experiences feel faster and more fluid.
Temperature
Parameter that controls response randomness. Temperature 0 = deterministic and predictable. Temperature 1 = creative and varied. Key for tuning model behavior.
Multimodal mode
A model's ability to process and generate different data types: text, images, audio, and video. GPT-4o, Gemini 1.5, and Claude 3 are multimodal.