Better Programming

Advice for programmers.

Follow publication

Member-only story

Talking to PDFs: GPT-4 and LangChain

Ulrik Thyge Pedersen
Better Programming
Published in
5 min readMay 9, 2023
Image by Author with @MidJourney

Have you ever wanted to create a chatbot that can answer questions about PDF files?

With the help of GPT-4 and LangChain, it’s now easier than ever to create a chatbot that can do just that! In this article, we’ll guide you through the process of creating your very own PDF chatbot using GPT-4 and LangChain.

First, let’s start with some background information on GPT-4 and LangChain. GPT-4 is the latest version of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI.

It’s capable of generating high-quality human-like text that can be used for a wide range of natural language processing tasks, including chatbots.

LangChain, on the other hand, is a Python library that provides an easy-to-use interface for creating chatbots powered by GPT-4. Now, let’s get started with creating our PDF chatbot using GPT-4 and LangChain!

Install Dependencies

To get started, we’ll need to install a few dependencies. First, let’s install the latest version of LangChain using pip:

pip install langchain

Next, we’ll need to install some additional libraries for working with PDF files. We recommend using the PyPDF2 library, which can be installed using pip:

pip install PyPDF2

Time to get some data!

Prepare the Data

Before we can start training our chatbot, we’ll need to prepare our data. In this case, our data will be a collection of PDF files that our chatbot will be able to answer questions about.

To prepare our data, we’ll first need to convert our PDF files into a format that our chatbot can understand. We’ll do this by extracting the text from each PDF file using the PyPDF2 library.

Here’s an example of how to extract the text from a PDF file using PyPDF2:

import PyPDF2

pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)

page_text = ''
for i in

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Ulrik Thyge Pedersen
Ulrik Thyge Pedersen

Written by Ulrik Thyge Pedersen

Senior Data Scientist @NTT Data | From Science to Data Science | Kaggle Master | linkedin.com/in/ulrikthygepedersen | github.com/UlrikThygePedersen

Responses (3)

Write a response