Three building blocks of modern AI
Understand the technology behind ChatGPT, search engines and smart apps.
When you type something into an AI chatbot these days, it almost seems like magic. But behind the scenes, a few very concrete techniques work together. In this article we explain three concepts in the right order:
1. LLM — Large Language Model
The brain. The AI that understands and generates text.
2. Embeddings — text as coordinates
The method for converting meaning into numbers, so a computer can work with it.
3. IVFFlat Index — smart searching
The technique to find the most relevant information at lightning speed.
The large language model
What exactly is an LLM, and how did it "learn" to talk?
A Large Language Model (LLM) is a computer program that has read an enormous amount of text — books, websites, articles — and has thereby learned how language works.
Think of it like a child growing up and hearing millions of sentences. Over time, the child learns not only words, but also how sentences work, how to answer a question, and how stories are structured. An LLM does exactly the same thing — but in a few weeks, with trillions of words.
An LLM actually predicts the next word each time — but because it does this so well at such an enormous scale, it seems as if it truly understands what you mean.
How did the model store all that knowledge? In the form of billions of tiny numbers — the so-called parameters. Think of parameters as the "memories" of the model, spread across an enormous network. The more parameters, the more nuance the model can remember.
How is an LLM trained?
Pre-training on the internet
The model reads trillions of words from the internet, Wikipedia, books and more. It learns to predict: "given these words, what probably comes next?"
Fine-tuning on specific tasks
The model is adjusted with thousands of examples of good answers, so it behaves like a helpful assistant.
Human feedback
People rate answers — good or bad — and the model learns from this. This way it keeps getting better at giving useful, honest answers.
Well-known LLMs include ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google) and the open-source model Llama (Meta).
Embeddings
How do you convert the meaning of a word into something a computer can work with?
A computer doesn't understand language — only numbers. Embeddings are the bridge: they translate text into a series of numbers that capture the meaning of that text.
Imagine that every sentence has an address in an enormous imaginary city. Sentences with a similar meaning live in the same neighbourhood. "The dog runs through the park" and "The puppy plays outside" are right next to each other. "The interest rate rose this quarter" lives in a completely different district.
In reality, that "map" is not 2D but has 1,536 dimensions — a number the average person cannot imagine, but for a computer it's just a long row of numbers. That row of numbers is called a vector.
An embedding is like a GPS coordinate for the meaning of a text. Two texts with similar meanings have coordinates that are close together.
Why is this useful? Suppose you have 100,000 articles stored. When you search for "what do dogs eat?", you don't just want to find articles with those exact words, but also articles about "nutrition for dogs" or "what can puppies eat?". With embeddings you can search by meaning, not just by literal words.
The IVFFlat Index
How do you find the most relevant information at lightning speed, without having to search through everything?
Now that we know every text has a place on the "meaning map", the next problem arises: if you have 1 million texts stored, how do you quickly find the text closest to your search query?
Without a smart technique you would have to compare every search query with all 1 million stored texts. That's too slow. The IVFFlat index solves this with a simple but brilliant idea: first divide into neighbourhoods, then only search in the right neighbourhood.
How does it work in practice?
Create groups (once)
When the database is created, all texts are automatically divided into groups (clusters) based on their meaning. Similar texts end up in the same group.
You ask a question
Your search query is also converted into an embedding — a place on the meaning map.
Find the closest group
The index looks at which group is closest to your search query. That takes microseconds.
Only search that group
Instead of comparing 1 million texts, perhaps only 10,000 are compared. Lightning fast.
IVFFlat might not always find the very best result, but it almost always finds an excellent result — and in a fraction of the time.
How do they work together?
The three concepts as one system.
Imagine you're building a smart search engine that searches 100,000 documents. Here's how it works behind the scenes:
Embedding model converts your question into numbers
Your search query becomes a vector — a series of numbers representing the meaning.
IVFFlat index finds the most relevant documents
In milliseconds, the documents with the most similar meaning are found.
LLM formulates an answer based on those documents
The language model reads the found documents and writes a clear answer in plain language.
This pattern — embeddings + vector index + LLM — is the foundation of virtually all modern AI search engines, chatbots with their own data, and smart document assistants. Now that you understand this, you grasp the fundamentals of how modern AI applications work.