How LLMs, RAG & Generative AI Actually Work

Artificial_Neural_Network_with_Chip_for post

April 20, 2026

Part 2 of the SAP Business AI Series | Back to Series Hub

The phrase “powered by AI” appears everywhere — in product announcements, implementation decks, and executive briefings. But when someone says SAP Joule is powered by Large Language Models with RAG grounding, what does that actually mean? And why does it matter for how you implement, trust, and extend these capabilities?

This post goes deeper on the core technologies: how LLMs work, why Generative AI is different from previous AI approaches, what RAG solves and how, and how the Model Context Protocol enables AI agents to interact with the outside world.

Large Language Models: The Engine Under the Hood

A Large Language Model is a deep learning system trained on massive text datasets to understand and generate human language. The “large” refers to both the scale of training data (trillions of tokens from books, websites, code, and documentation) and the number of parameters — the internal numerical weights the model adjusts as it learns, typically running into the billions.

LLMs are built on transformer architecture, which introduced a key innovation called the self-attention mechanism. Before transformers, language models processed text sequentially — word by word — which made it difficult to capture relationships between words that were far apart in a sentence. Self-attention lets the model evaluate every word in relation to every other word simultaneously, regardless of position. This is what allows an LLM to understand that in the sentence “The contract was signed by the vendor, but the buyer disputed it,” the word “it” refers to “the contract” — not “the vendor” or “the buyer.”

How LLMs Are Trained

Training an LLM involves exposing it to vast amounts of text and asking it to predict the next word or token, billions of times over. Through a feedback mechanism called backpropagation, the model adjusts its internal weights each time it gets a prediction wrong, gradually learning the statistical patterns of language — grammar, style, factual associations, reasoning structures, and more.

LLMs do not memorise text. They learn patterns. This distinction matters: it is why an LLM can produce coherent, contextually appropriate responses to questions it has never specifically encountered — and also why it can sometimes produce plausible-sounding but factually incorrect responses, a problem known as hallucination.

Fine-Tuning for Enterprise Use

A base LLM trained on general internet text does not naturally understand SAP-specific terminology — concepts like profit centres, material masters, purchase requisitions, or IDoc message types. Fine-tuning addresses this by continuing the training process on a smaller, domain-specific dataset. The same transformer architecture is reused, but internal weights are adjusted to align the model with specialised language, tone, and objectives.

SAP fine-tunes models specifically for business contexts — finance language, supply chain terminology, HR processes — so that Joule and other AI capabilities produce outputs that make sense in the context of SAP applications, not just general conversations.

The Hallucination Problem and Why It Matters

Hallucination is the phenomenon where an LLM generates information that sounds authoritative and coherent but is factually wrong. It happens because LLMs generate responses based on statistical probability — the most likely next word given the context — not based on verified factual recall. When the model is uncertain, it can produce a confident-sounding guess.

For consumer AI, a hallucination about a historical date is an annoyance. For enterprise AI answering questions about Q3 revenue, supplier compliance status, or inventory levels, a hallucination is a business risk. This is the problem that Retrieval-Augmented Generation was designed to solve.

Retrieval-Augmented Generation (RAG): Grounding AI in Real Data

RAG is one of the most important architectural patterns in enterprise AI. It solves two fundamental limitations of standard LLMs: they have a knowledge cutoff (they do not know what happened after their training data ends) and they have no access to private business data (your SAP transactions, contracts, or internal documents).

RAG changes this by inserting a retrieval step before the generation step:

User submits a query — “What are our open purchase orders for Supplier X?”
Retrieval step — the system searches a live knowledge base, database, or document store for relevant information
Augmentation — retrieved content is combined with the original query into a structured prompt
Generation step — the LLM uses both its trained knowledge and the retrieved data to generate a response grounded in real, current information

The impact on output quality is dramatic. Instead of the model guessing or drawing on general training knowledge, it is answering based on actual data pulled from your SAP system moments before the response was generated. Fewer hallucinations, higher accuracy, and crucially — responses that can be traced back to a source.

RAG in the SAP Context

SAP Joule and SAP AI Core use RAG to fetch live business data from S/4HANA, SuccessFactors, SAP Ariba, and other applications before generating answers. When a CFO asks Joule about Q3 revenue by region, the system does not rely on anything the model was trained on. It retrieves current financial data from the connected SAP system and frames the response around that specific, verified information.

This makes RAG the foundation for trustworthy enterprise AI — not just impressive AI.

Vector Embeddings and Semantic Search

For RAG to work, the retrieval step needs to find relevant information quickly and accurately — not just by matching keywords, but by understanding meaning. This is where vector embeddings come in.

A vector embedding converts text into a high-dimensional numerical representation that captures semantic meaning. Sentences with similar meanings produce vectors that are close together in this high-dimensional space, even if they use completely different words. “Show me overdue invoices” and “List unpaid bills past their due date” will produce similar vectors — so a vector search will recognise both as the same intent and retrieve the same relevant content.

SAP HANA’s Vector Engine stores and processes these embeddings directly inside the database, enabling enterprise-grade semantic search at scale without requiring a separate vector database. It is the retrieval infrastructure that powers document grounding in SAP’s AI stack.

Model Context Protocol (MCP): Connecting AI to the World

RAG allows AI to read from external sources. But what about acting on them? For AI agents to create purchase orders, update employee records, or trigger workflows, they need a standardised way to communicate with external systems. That is what the Model Context Protocol provides.

MCP is an open standard that defines how AI agents connect securely and dynamically to tools, data sources, and business systems. It distinguishes between three types of capabilities:

Tools — executable functions with side effects. An agent calling a “send invoice” tool actually sends an invoice in the connected system.
Resources — data sources for information retrieval. An agent accessing a “customer history” resource pulls real customer data from SAP Sales Cloud.
Prompts — structured instructions that ensure AI-generated content follows consistent standards, such as a quarterly report template that keeps summaries within legal boundaries.

The significance of MCP for enterprise AI is that it breaks down the integration barrier. Instead of building custom connectors for every combination of AI model and business system, MCP provides a unified interface. Any MCP-compatible agent can connect to any MCP-compatible system — a scalability advantage that becomes more valuable as AI deployments expand across the enterprise.

LLMs vs. Traditional Language Models: What Changed

Before LLMs, language models were trained for specific tasks — sentiment classification, named entity recognition, translation — and could not transfer knowledge between tasks. Each application required its own trained model. LLMs changed this fundamentally by creating general-purpose language intelligence that can handle a broad range of tasks from a single pre-trained base.

The practical implication for SAP customers is significant. Joule does not need a separate AI model for HR questions, a different one for financial queries, and another for procurement workflows. A single LLM, fine-tuned for SAP context and grounded with RAG, handles all of these through the same conversational interface. That convergence is what makes the “AI copilot across the enterprise” vision achievable rather than aspirational.

Key Takeaways

LLMs learn statistical patterns from text — they do not memorise facts, which is both their strength and the source of hallucinations
RAG grounds LLM responses in real, current data by adding a retrieval step before generation — essential for trustworthy enterprise AI
Vector embeddings enable semantic search — finding relevant content by meaning, not just keywords
MCP enables AI agents to act on external systems, not just read from them
Fine-tuning aligns general LLMs with SAP-specific business language and process context

Next in the series: Post 3 — SAP Business AI Strategy: The Big Picture →

Large Language Models: The Engine Under the Hood

How LLMs Are Trained

Fine-Tuning for Enterprise Use

The Hallucination Problem and Why It Matters

Retrieval-Augmented Generation (RAG): Grounding AI in Real Data

RAG in the SAP Context

Vector Embeddings and Semantic Search

Model Context Protocol (MCP): Connecting AI to the World

LLMs vs. Traditional Language Models: What Changed

Key Takeaways

Like this:

Related

Stay SAP Sharp

Top Blog Posts

Large Language Models: The Engine Under the Hood

How LLMs Are Trained

Fine-Tuning for Enterprise Use

The Hallucination Problem and Why It Matters

Retrieval-Augmented Generation (RAG): Grounding AI in Real Data

RAG in the SAP Context

Vector Embeddings and Semantic Search

Model Context Protocol (MCP): Connecting AI to the World

LLMs vs. Traditional Language Models: What Changed

Key Takeaways

Share this:

Like this:

Related

Stay SAP Sharp

Top Blog Posts