Parts & terminology

Before we fly into it, there are some terms that can be used that may still be relevant. 

AI model

AI Model

A Model is a piece of software trained on a set of data to recognize certain patterns. A model can generate text as output, or images or predict a simple numerical value. 

Some examples of AI models: 

  • DALL-E: model that generated images.
  • Whispers: model that converts audio into text.
  • Embedding: model that converts text into numerical form.
  • GPT: set of models that can generate natural  language.
AI token example


A Token is a piece of a text. For example, in the text "Lorem ispsum dolor sit amet" there are several tokens: "sit", "dol", "or", "orem" and so on.

Build me with chatgpt prompt


A prompt is a command given to an AI model to generate some type of output. A prompt can consist of several components. Components that sometimes occur: "format" (how should the output come back out), "reference" (may the AI rely on existing works), "request" (what exactly should happen, "Framing" (background information).

Did you know?

Did you know that creating or fine-tuning prompts is called "prompt engineering"?
Did you also know that there are masses of cheat sheets to create smart prompts? A prompt engineering cheat sheet example.


The temperature is the scale from 0 (deterministic) to 1 (random). This is usually set on the low side in our RAG applications because we are not very fond of randomness. 

embedding visualisation (distances between words)


There are AI models that have as input text and as output a vector of numbers. We call this kind of AI embeddings. Thanks to this numeric representation, we distance ourselves (for example, to do smart searching) from certain wording. In other words, this eliminates the need to use explicit search terms.

Vector database visalised

Vector Database

We then store the created embeddings in a vector Database. These are databases where there is no more text. All the content of the site can then be indexed in such a vector database. If we do a query later, we can find related articles. 

LLM (Large Language Model)

A large language model is a type of AI model that can recognize and generate text. LLMs are trained on gvery large datasets (today Large). 
Some examples of LLMs:

  • chatGPT
  • Claude
  • Ernie
  • Llama
  • StableLM

You can compare the best LLMs for yourself here:

Overzicht hoe RAG werkt

How it works

Indexing content

We want to query our own content, that way any modification the editors do in the content is also immediately used to generate correct answers. 

The moment an editor makes a change in the content we take our text (can be text from the CMS or text from a PDF file) to an embedding model. 
For example, we use the Mistral or OpenAI embedding model. 
This model will then turn the text into a vector. This vector is then stored in the Vector Database. 
This way, every piece of content we want to query has a vector in our Vector Database almost immediately.
In the diagram, this is steps 1 to 4.

From query to vector query

I am visiting the site as an anonymous site user and I have a pressing question. I can't (or can't immediately find the answer without a complex combination of search terms) on the site.  

In the question field, I enter my question. 
We are going to have the embedding model translate this question into a vector. 

Then, using this vector, we are going to ask the vector database the question, "what vectors are around here". The reason we choose a vector database is because this kind of query (which ones are around here) can be answered very well by vector databases. 
We also give this query a limit. We call it K. K should not be too low, but also not too high. The higher you set K, the poorer the results. This seems counterintuitive, yet it is so. We'll look at the exact tweaking of K for your situation on a case-by-case basis.

The Vector database returns a list of articles that are content-wise related to your question. 

In the diagram, this is steps 5 & 6.

From database result to textual response

Now we are going to create (behind the scenes) a prompt to have the language model (LLM) generate a response. 

We'll create a prompt that looks something like this: 
1 Given these K articles,
2 Given this user question
3 Given our tone of voice and house rules
Generate an answer to the question. 

End result: The user gets an answer to their question, based on the content from our knowledge base.

Pricing comparison LLM's

And how much does such a thing cost?

Different language models have different pricings. Fortunately, people have already put them side by side. You can compare the prices here.

We can also estimate based on the number of articles on your site and ask how much it would cost to use the AI. We have made an excel sheet for this purpose. Please contact Frederik Wouters to have this forwarded to you. Be sure to include "AI RAG price excell" in your email! 

Want to know more?