Using GPT-3 for Search and Recommendations of Text Content

Utilize cosine similarity to find similar documents.

Published in

Better Programming

4 min readJan 18, 2023

In this blog post, we will be discussing how to use GPT-3 vectors for a recommendation system that utilizes cosine similarity to find similar documents.

GPT-3, developed by OpenAI, is a state-of-the-art language model that has been trained on a massive amount of text data.

One of the key features of GPT-3 is its ability to generate high-quality text, but it also has the capability to generate vector representations of the input text, which can be used for a variety of natural language processing tasks, such as document similarity analysis.

Step 1: Generate GPT-3 Vectors

The first step in using GPT-3 vectors for a recommendation system is to generate the vectors for your set of documents. This can be done using the OpenAI GPT-3 API, which allows you to send a block of text and receive a vector representation in return.

To generate the vectors for your set of documents, you will need to send each document to the API and store the returned vectors in an array. Here is an example of how to generate GPT-3 vectors for a set of documents in Python:

Step 2: Calculate Cosine Similarity

Once you have the vectors for your set of documents, you can use the cosine similarity metric to find the similarity between them. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. To calculate the cosine similarity between two vectors, you can use the following formula:

cosine_similarity = (A * B) / (||A|| * ||B||)

Where A and B are the vectors for two documents and ||A|| and ||B|| are the magnitudes of the vectors. Here is an example of how to calculate the cosine similarity between all pairs of documents in Python:

Step 3: Find the most similar documents

Now that you have the cosine similarity scores for all the pairs of documents, you can find the most similar documents by sorting the scores in descending order and selecting the top N results.

You can use the resulting list of similar documents to recommend similar content to your users. Here is an example of how to find similar documents using cosine similarity in Python:

In this example, docs is a list of strings representing the set of documents. vectors is a list of the GPT-3 vectors generated for each document using the OpenAI GPT-3 API.

The nested loops iterate over all pairs of documents and calculate the cosine similarity between them. The documents with a cosine similarity greater than a certain threshold are added to the similar_documents list.

Finally, the similar documents are sorted based on cosine similarity in descending order and the top N similar documents are printed.

Please note that you will have to replace the #code to generate the vector representation of the document using OpenAI GPT-3 API with the actual code you use to generate the GPT-3 vectors as explained above.

Step 4: Taking this to Production

You can index the GPT-3 vectors in Elasticsearch or ArangoDB in a way that supports cosine similarity.

In Elasticsearch for example you can vector dense field type as described here. This is an exampling of how the mapping should look like:

{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 1024,
        "similarity": "l2_norm",
        "index": true
      },
      "document" : {
        "type" : "text"
      }
    }
  }
}

I didn’t evaluate this with ArangoDB, but you can follow the documentation starting here and let me know in the comments if you can get a promising result.

It’s important to note that vector similarity search is computationally expensive, so you might want to consider using a vector similarity search engine like Faiss or NMSLIB instead, as they are designed to handle large vector datasets with fast similarity search.

Please note that this is a high-level overview of the steps required, and you may need to refer to the specific documentation of Elasticsearch or ArangoDB for more detailed information on how to implement each step.

Conclusion

In this blog post, we have discussed how to use GPT-3 vectors for a recommendation system that utilizes cosine similarity to find similar documents.

By generating GPT-3 vectors for your set of documents, calculating the cosine similarity between them, and finding the most similar documents, you can create a powerful recommendation system that can help your users discover new and relevant content.

For more information about GPT-3 embeddings, please refer to the OpenAI documentation