Member-only story
Fine-Tuning Your Embedding Model to Maximize Relevance Retrieval in RAG Pipeline
NVIDIA SEC 10-K filing analysis before and after fine-tuning embeddings

Let’s continue from our previous article, Fine-Tuning the GPT-3.5 RAG Pipeline with GPT-4 Training Data. This time, let’s dive into fine-tuning the other end of the spectrum of our RAG (Retrieval Augmented Generation) pipeline — the embedding model.
By fine-tuning our embedding model, we enhance our system’s ability to retrieve the most relevant documents, ensuring that our RAG pipeline performs at its best.
We have been using OpenAI’s embedding model text-embedding-ada-002
for most of our RAG pipelines in our LlamaIndex blog series. However, OpenAI does not offer the feature to fine-tune text-embedding-ada-002
, so let’s explore fine-tuning an open source embedding model in this article.
BAAI/bge-small-en
The current number 1 embedding model on HuggingFace’s MTEB (Massive Text Embedding Benchmark) Leaderboard is bge-large-en
; it was developed by the Beijing Academy of Artificial Intelligence (BAAI). It is a pretrained transformer model that can be used for various natural language processing tasks, such as text classification, question answering, text generation, etc. The model is trained on a massive dataset of text and code, and it has been fine-tuned on the Massive Text Embedding Benchmark (MTEB).
For this article, we are going to use one of bge-large-en
’s siblings, bge-small-en
, a 384-dimensional small-scale model with competitive performance, perfect for running in Google Colab.
Fine-Tune Embedding Model vs. Fine-Tune LLM
From our last article on fine-tuning gpt-3.5-turbo
, we gained a solid understanding of the steps involved in fine-tuning an LLM. Compared with LLM fine-tuning, the implementation of fine-tuning bge-small-en
have some similarities and differences.
Similarities
- Both types of fine-tuning follow the same approach of generating datasets for training and evals, fine-tuning the model, and finally evaluating the performances between the base and fine-tuned models.