What is RAG?

Retrieval-Augmented Generation gives AI access to external knowledge. Like having a research assistant who can look things up before answering.

6 min read

Imagine you're taking an exam, but instead of memorizing everything beforehand, you're allowed to bring reference books and look things up during the test.

That's basically what RAG (Retrieval-Augmented Generation) does for AI.

The problem RAG solves

Large language modelsLarge Language Model (LLM)AI trained on massive text data to understand and generate human language.Click to learn more → are trained on massive datasets, but they have a few limitations:

Their knowledge has a cutoff date. GPT-4's training stopped in early 2024. It doesn't know what happened yesterday.
They can't access private information. They weren't trained on your company's internal documents or personal files.
They sometimes hallucinateHallucinationWhen AI confidently generates false or made-up information.Click to learn more →. When they don't know something, they might confidently make things up.
Their knowledge is "baked in." You can't update what they know without retraining the entire model.

RAG fixes all of these problems by giving the AI the ability to look things up.

How RAG works

Think of RAG as a two-step process:

Step 1: Retrieve (the "R" in RAG)

When you ask a question, the system first searches through a database of documents to find relevant information.

"What's our return policy?" → Search company documents → Find the returns policy page

Step 2: Generate (the "G" in RAG)

The AI takes your original question AND the retrieved documents, then generates an answer based on both.

It's like asking a human assistant who checks the filing cabinet before responding.

┌─────────────────────────────────────────────────────────────┐ │ │ │ WITHOUT RAG │ │ ━━━━━━━━━━━━ │ │ │ │ User Question ──► AI Model ──► Generated Answer │ │ │ │ │ └── Based only on training data │ │ │ ├─────────────────────────────────────────────────────────────┤ │ │ │ WITH RAG │ │ ━━━━━━━━━ │ │ │ │ User Question ──► Search Documents ──► Retrieve Info │ │ │ │ │ │ ▼ ▼ │ │ [Document Database] [Relevant Docs] │ │ │ │ │ ▼ │ │ AI Model ──► Enhanced Answer │ │ │ └─────────────────────────────────────────────────────────────┘

A concrete example

Let's say you ask: "What's the weather like in Tokyo?"

Without RAG: The model might say something like "I don't have access to current weather data" or make up weather information based on typical patterns it learned.

With RAG:

The system searches current weather APIs
Finds: "Tokyo: 22°C, partly cloudy, humidity 65%"
The AI generates: "Currently, Tokyo is experiencing partly cloudy conditions with a temperature of 22°C (72°F) and humidity at 65%. It's quite pleasant weather today."

The AI's response is grounded in real, current data.

The technical pieces

Document Storage

RAG systems need a database of documents to search through. This could be:

Company wikis and documentation
Recent news articles
Product catalogs
Legal databases
Personal notes and files

Semantic Search

RAG doesn't just search for exact keyword matches. It uses embeddingsEmbeddingConverting text into numbers (vectors) that capture meaning, so similar concepts are close together.Click to learn more → to find documents that are conceptually related to your question.

Asking "how to cancel" might retrieve documents about "refunds," "returns," and "account closure" even if they don't contain the word "cancel."

Context Integration

The AI model receives both your original question and the retrieved documents as context. It's taught to prioritize the retrieved information over its training data when they conflict.

Types of RAG

Basic RAG

Search, retrieve, generate. The simple three-step process described above.

Advanced RAG

More sophisticated approaches that might:

Search multiple times during generation
Verify information across sources
Rank and filter retrieved documents
Chain multiple retrieval steps

Agentic RAG

The AI can decide when and what to search for. It might:

Ask follow-up questions
Search multiple databases
Synthesize information from different sources

Real-world applications

Customer support: RAG systems can answer questions by retrieving information from knowledge bases, recent tickets, and product documentation.

Legal research: Find relevant cases and statutes for specific legal questions.

Medical assistance: Access current research papers and medical guidelines.

Internal company Q&A: Answer employee questions using internal documentation, policies, and procedures.

News and analysis: Provide up-to-date information by retrieving recent news articles.

Question: "What are the side effects of the new diabetes medication our company is developing?"

RAG system:

Searches internal clinical trial documents
Finds relevant safety reports
Generates answer: "Based on our Phase II trial results from last month, the most common side effects include mild nausea (12% of patients) and temporary dizziness (8% of patients). No serious adverse events have been reported in the 200-person trial group."

Without RAG: "I don't have information about your specific medication development."

The challenges

Retrieval quality: If the search doesn't find the right documents, the AI can't give good answers. Garbage in, garbage out.

Context limits: AI models have limited context windowsContext WindowThe maximum amount of text an AI can consider at once — includes your messages, history, and its response.Click to learn more →. You can only include so many retrieved documents.

Conflicting information: What if different documents say different things? The AI needs strategies for handling contradictions.

Performance: Searching large document databases and processing long contexts takes time and computational resources.

Data freshness: The retrieved information is only as current as the last time the database was updated.

RAG vs Fine-tuning

Both fine-tuningFine-tuningCustomizing a pre-trained AI model on specific data to improve its performance for a particular task.Click to learn more → and RAG customize AI for specific use cases, but they work differently:

Fine-tuning bakes knowledge into the model's parameters. It's like intensive training to become a specialist.

RAG gives the model access to external information. It's like having a reference library.

When to use fine-tuning:

You need consistent style or behavior
The knowledge is stable and doesn't change much
You want faster response times

When to use RAG:

Your information changes frequently
You need to cite sources
You want to add new information without retraining
You have multiple knowledge bases to search

Many systems use both: fine-tune for style and behavior, RAG for current information.

Why RAG matters

RAG makes AI much more practical for real-world applications. Instead of AI being limited to what it learned during training, it can access and reason about current, private, and specialized information.

It's the difference between talking to someone who only knows what they learned in school versus someone who can also research and look things up to give you current, accurate information.

The bottom line: RAG turns AI from a static knowledge base into a dynamic research assistant. It can find, synthesize, and reason about information it was never explicitly trained on, making AI far more useful for practical applications.

RAG helps AI access external knowledge. But first, that knowledge needs to be converted into a format AI can search. Next: What are Embeddings?, the technology that makes semantic search possible.

Get new explanations in your inbox

Every Tuesday and Friday. No spam, just AI clarity.