Connecting your Enterprise Data to an LLM: Should I Use RAG or a Custom Fine-Tuned Model?

As enterprises increasingly turn to AI to gain insights from their data, one of the key decisions they face is how to connect their enterprise data to a large language model (LLM). I often get very high expectations from clients who want to build their custom connector. Two popular approaches are Retrieval-Augmented Generation (RAG) and custom fine-tuned models. In this article, we will explore both options, their advantages, and how a hybrid approach might be the best solution for your needs. Let's take a simple example before a deep dive.

Let's consider we want to connect to a CRM that has 5 tables/entities. We might want to ask questions like "Can you tell me how many clients live in France ? " which is a Text-to-SQL retrieval but we might also ask "Can you summarize all conversations I had with Elon Musk ?" which is more a semantic circular text retrieval. Or worse we can do multiple questions in one like "Can you retrieve all my clients in France and export them to a CSV. Then, for each contact add a new column summarizing their latest invoices amounts " . Mmmm this gets tricky.

What is RAG?

Retrieval-Augmented Generation (RAG) is an approach where the LLM retrieves relevant documents or data points from a large database and then generates responses based on this retrieved information. This method involves three key components:

Embeddings: Embeddings are vector representations of data that capture the semantic meaning of words or phrases. They are used to measure the similarity between queries and documents.
Vector Database: A vector database stores embeddings and allows for efficient similarity searches. Popular vector databases include Qdrant, Faiss, Chroma, and Azure AI Search.
Retrieval Process: When a query is made, the LLM retrieves relevant data from the vector database using similarity search and generates a response based on this data. The similarity search function that you use is KEY. Most developpers use text-only RAG which is very limited. Keep in mind to always have a multi-modal rag.

Below is an illustration of the RAG concept:

Custom Fine-Tuned Model

Creating a custom fine-tuned model involves training an LLM with your enterprise data. This process requires substantial computational resources, typically involving GPUs, and consists of the following steps:

Data Collection: Gather and preprocess the enterprise data that will be used for training.
Model Training: Use GPUs to train the LLM with your custom data. This involves adjusting the model's weights to better understand and generate responses based on your specific data.
Deployment: Once the model is trained, it can be deployed to generate highly accurate responses specific to your enterprise data.

Here’s a visual representation of the data training process:

Advantages of Custom Fine-Tuned Models

Accuracy: A fine-tuned model can provide highly accurate and context-specific responses since it is trained directly on your enterprise data.
Reliability: Fine-tuned models are more reliable in understanding and explaining complex or proprietary information specific to your enterprise.
Multi-query and mixture of experts !!

Combining RAG and Custom Fine-Tuned Models

While both RAG and custom fine-tuned models have their advantages, a hybrid approach can often provide the best results. Here’s how you can leverage both methods:

Cold Data: For data that does not change frequently (cold data), it is beneficial to train this data into the custom fine-tuned model. This ensures that the model has a deep understanding of the static data, leading to highly accurate responses.
Dynamic Data: For data that changes frequently or requires regular updates (dynamic data), using RAG can be more efficient. The model can retrieve the most up-to-date information from the vector database and generate responses based on the latest data.

This hybrid approach allows you to maintain the accuracy and reliability of a fine-tuned model while leveraging the flexibility and up-to-date information provided by RAG.

Conclusion

Deciding between RAG and a custom fine-tuned model depends on your enterprise's specific needs and data dynamics. For static, less frequently changing data, a fine-tuned model can provide high accuracy and reliability. For dynamic, frequently changing data, RAG offers flexibility and up-to-date information retrieval. Often, a combination of both approaches can provide the best results, ensuring that your enterprise data is both accurately represented and current.

By understanding and leveraging these technologies, you can connect your enterprise data to an LLM in a way that maximizes accuracy, flexibility, and value.