How AI really works

Introduction

Three building blocks of modern AI

Understand the technology behind ChatGPT, search engines and smart apps.

When you type something into an AI chatbot these days, it almost seems like magic. But behind the scenes, a few very concrete techniques work together. In this article we explain three concepts in the right order:

🧠

1. LLM — Large Language Model

The brain. The AI that understands and generates text.

📍

2. Embeddings — text as coordinates

The method for converting meaning into numbers, so a computer can work with it.

🗂️

3. IVFFlat Index — smart searching

The technique to find the most relevant information at lightning speed.

Chapter 1

The large language model

What exactly is an LLM, and how did it "learn" to talk?

A Large Language Model (LLM) is a computer program that has read an enormous amount of text — books, websites, articles — and has thereby learned how language works.

Think of it like a child growing up and hearing millions of sentences. Over time, the child learns not only words, but also how sentences work, how to answer a question, and how stories are structured. An LLM does exactly the same thing — but in a few weeks, with trillions of words.

An LLM actually predicts the next word each time — but because it does this so well at such an enormous scale, it seems as if it truly understands what you mean.

Figure 1 — An LLM receives your text, processes it and generates an answer.

How did the model store all that knowledge? In the form of billions of tiny numbers — the so-called parameters. Think of parameters as the "memories" of the model, spread across an enormous network. The more parameters, the more nuance the model can remember.

How is an LLM trained?

1

Pre-training on the internet

The model reads trillions of words from the internet, Wikipedia, books and more. It learns to predict: "given these words, what probably comes next?"

2

Fine-tuning on specific tasks

The model is adjusted with thousands of examples of good answers, so it behaves like a helpful assistant.

3

Human feedback

People rate answers — good or bad — and the model learns from this. This way it keeps getting better at giving useful, honest answers.

Well-known LLMs include ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google) and the open-source model Llama (Meta).

Chapter 2

Embeddings

How do you convert the meaning of a word into something a computer can work with?

A computer doesn't understand language — only numbers. Embeddings are the bridge: they translate text into a series of numbers that capture the meaning of that text.

Imagine that every sentence has an address in an enormous imaginary city. Sentences with a similar meaning live in the same neighbourhood. "The dog runs through the park" and "The puppy plays outside" are right next to each other. "The interest rate rose this quarter" lives in a completely different district.

Figure 2 — Each text gets a place on a "meaning map". Similar texts are located close together.

In reality, that "map" is not 2D but has 1,536 dimensions — a number the average person cannot imagine, but for a computer it's just a long row of numbers. That row of numbers is called a vector.

An embedding is like a GPS coordinate for the meaning of a text. Two texts with similar meanings have coordinates that are close together.

Why is this useful? Suppose you have 100,000 articles stored. When you search for "what do dogs eat?", you don't just want to find articles with those exact words, but also articles about "nutrition for dogs" or "what can puppies eat?". With embeddings you can search by meaning, not just by literal words.

Chapter 3

The IVFFlat Index

How do you find the most relevant information at lightning speed, without having to search through everything?

Now that we know every text has a place on the "meaning map", the next problem arises: if you have 1 million texts stored, how do you quickly find the text closest to your search query?

Without a smart technique you would have to compare every search query with all 1 million stored texts. That's too slow. The IVFFlat index solves this with a simple but brilliant idea: first divide into neighbourhoods, then only search in the right neighbourhood.

Figure 3 — The index divides all texts into groups. For a search query, only the most relevant group is searched.

How does it work in practice?

1

Create groups (once)

When the database is created, all texts are automatically divided into groups (clusters) based on their meaning. Similar texts end up in the same group.

2

You ask a question

Your search query is also converted into an embedding — a place on the meaning map.

3

Find the closest group

The index looks at which group is closest to your search query. That takes microseconds.

4

Only search that group

Instead of comparing 1 million texts, perhaps only 10,000 are compared. Lightning fast.

IVFFlat might not always find the very best result, but it almost always finds an excellent result — and in a fraction of the time.

Summary

How do they work together?

The three concepts as one system.

Imagine you're building a smart search engine that searches 100,000 documents. Here's how it works behind the scenes:

Figure 4 — The complete system: embeddings → index → best matches → LLM generates the answer.

📍

Embedding model converts your question into numbers

Your search query becomes a vector — a series of numbers representing the meaning.

🗂️

IVFFlat index finds the most relevant documents

In milliseconds, the documents with the most similar meaning are found.

🧠

LLM formulates an answer based on those documents

The language model reads the found documents and writes a clear answer in plain language.

This pattern — embeddings + vector index + LLM — is the foundation of virtually all modern AI search engines, chatbots with their own data, and smart document assistants. Now that you understand this, you grasp the fundamentals of how modern AI applications work.

✦ ✦ ✦