Build a Support Bot From Your Company’s Knowledge Base With Python and OpenAI

Introduction
In today’s fast-paced digital world, customer support has become an essential element of any successful business. As a result, organizations are consistently looking for ways to improve their customer service offerings and maximize customer satisfaction. One of the most effective strategies for achieving this is by leveraging artificial intelligence (AI) to create support bots that can quickly and accurately address customer inquiries.
In this blog, we will show you how to build a powerful support bot using your company’s knowledge base and ChatGPT by OpenAI. This innovative combination will revolutionize your customer support experience, providing immediate, accurate, and engaging responses to your customer’s questions and concerns.
We’ve made setting up your support bot quick and easy with our pre-built implementation on GitHub. Simply edit the company name and add your documents to the documents folder at Github Repo. Get started effortlessly and enhance your customer support experience with this powerful AI-driven solution.
Get Started
Install Requirements
Create a new directory qna-app,
we will run our app here. . Run the below lines in your terminal to install the requirements,
pip3 install flask openai python-dotenv glob2 numpy
Adding OpenAI Key
Create a .env
file. Replace YOUR_OPENAI_KEY_HERE
with your OpenAI Api Key
OPENAI_KEY=YOUR_OPENAI_KEY_HERE
Adding Documents to Search From
Now, Create a directory named documents
and add the files you want to search from. Files should be in .txt
format.
For now, you can download this file documents, and unzip it in your project root folder. The folder contains 3 files including About Us, and 2 Blog Pages from dreamboat.ai.
Create Embeddings
GPT has a character limit, and you cannot pass all your data every time to it. To tackle this problem, we are creating embeddings.
Consider Embeddings as numbers assigned for each word, where these numbers contain the true meaning of the word. Later, we will create embedding for the question asked by the user, find text blocks that have similar meaning or contain relevant data to the question asked by the user, and pass only that to GPT.
Create an embed_text.py
file and paste the below contents in it:
import openai
import os
import csv
import glob
from dotenv import load_dotenv
load_dotenv()
text_array = []
api_key = os.environ.get('OPENAI_KEY')
openai.api_key = api_key
dir_path = os.path.join(os.getcwd(), 'documents')
dir_full_path = os.path.join(dir_path, '*.txt')
embeddings_filename = "embeddings.csv"
# Loop through all .txt files in the /training-data folder
for file in glob.glob(dir_full_path):
# Read the data from each file and push to the array
# The dump method is used to convert spacings into newline characters \n
with open(file, 'r') as f:
text = f.read().replace('\n', '')
text_array.append(text)
# This array is used to store the embeddings
embedding_array = []
if api_key is None or api_key == "YOUR_OPENAI_KEY_HERE":
print("Invalid API key")
exit()
# Loop through each element of the array
for text in text_array:
# Pass the text to the embeddings API which will return a vector and
# store in the response variable.
response = openai.Embedding.create(
input=text,
model="text-embedding-ada-002"
)
# Extract the embedding from the response object
embedding = response['data'][0]["embedding"]
# Create a Python dictionary containing the vector and the original text
embedding_dict = {'embedding': embedding, 'text': text}
# Store the dictionary in a list.
embedding_array.append(embedding_dict)
with open(embeddings_filename, 'w', newline='') as f:
# This sets the headers
fieldnames = ['embedding', 'text']
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for obj in embedding_array:
# The embedding vector will be stored as a string to avoid comma
# separated issues between the values in the CSV
writer.writerow({'embedding': str(obj['embedding']), 'text': obj['text']})
print("Embeddings saved to:", embeddings_filename)
Now, we have created a function that will be used to create embeddings, lets start
Create the embedding by running the below command in your terminal
python embed_text.py
If you have done everything right, you will see a Embeddings saved to embeddings.csv
message on your screen and a file embeddings.csv
will be created in your project root folder.
We now have the embeddings, let's create a function that takes in a question and answer the questions using our provided information.
Answer Question
Create a file answer.py
and paste the below contents in the file, replace the company name on top with the company name whose knowledge base you are using
import json
import openai
import csv
import os
from dotenv import load_dotenv
load_dotenv()
embeddings_filename = "embeddings.csv"
company_name = "Dreamboats.ai"
def calculate_similarity(vec1, vec2):
# Calculates the cosine similarity between two vectors.
dot_product = sum([vec1[i] * vec2[i] for i in range(len(vec1))])
magnitude1 = sum([vec1[i] ** 2 for i in range(len(vec1))]) ** 0.5
magnitude2 = sum([vec2[i] ** 2 for i in range(len(vec2))]) ** 0.5
return dot_product / (magnitude1 * magnitude2)
def chat():
start_chat = True
while True:
openai.api_key = os.environ.get('OPENAI_KEY')
if start_chat:
print("Welcome to the", company_name, "Knowledge Base. How can I help you?")
start_chat = False
print("Type 'quit' to exit.")
else:
print("Any Other Questions?")
question = input("> ")
if question == "quit":
break
# Exit the loop if the user presses enter without typing anything
if not question:
break
response = openai.Embedding.create(
model="text-embedding-ada-002",
input=[question]
)
try:
question_embedding = response['data'][0]["embedding"]
except Exception as e:
print(e.message)
continue
# Store the similarity scores as the code loops through the CSV
similarity_array = []
# Loop through the CSV and calculate the cosine-similarity between
# the question vector and each text embedding
with open(embeddings_filename) as f:
reader = csv.DictReader(f)
for row in reader:
# Extract the embedding from the column and parse it back into a list
text_embedding = json.loads(row['embedding'])
# Add the similarity score to the array
similarity_array.append(calculate_similarity(question_embedding, text_embedding))
# Return the index of the highest similarity score
index_of_max = similarity_array.index(max(similarity_array))
# Used to store the original text
original_text = ""
# Loop through the CSV and find the text which matches the highest
# similarity score
with open(embeddings_filename) as f:
reader = csv.DictReader(f)
for rowno, row in enumerate(reader):
if rowno == index_of_max:
original_text = row['text']
system_prompt = f"""
You are an AI assistant. You work for #{company_name}. You will be asked questions from a
customer and will answer in a helpful and friendly manner.
You will be provided company information from #{company_name} under the
[Article] section. The customer question will be provided under the
[Question] section. You will answer the customers questions based on the
article. Only provide the answer to the query don't respond with completed part of question.
Answer in points and not in long paragraphs
If the users question is not answered by the article you will respond with
'I'm sorry I don't know.
'
"""
question_prompt = f"""
[Article]
{original_text}
[Question]
{question}
"""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": question_prompt
}
],
temperature=0.2,
max_tokens=2000,
)
try:
answer = response['choices'][0]['message']['content']
except Exception as e:
print(e.message)
continue
print("\n\033[32mSupport:\033[0m")
print("\033[32m{}\033[0m".format(answer.lstrip()))
print("Goodbye! Come back if you have any more questions. :)")
chat()
That’s It. Run the below command in terminal
python answer.py
You can now ask AI any questions about your company and it will reply back with relevant information provided by you. We have also restricted it to answer very general or irrelevant questions not present in the documents like What is capital of USA?

Adding Company Knowledge Base
You can add documents to the knowledge base by adding them to the documents
folder.
Files should be in .txt
format.
After adding the documents, you need to run the embed_text.py
script to create embeddings for the documents.
Change the company name variable in answer.py
.