How ChatGPT’s Coding Skills Got Me Drunk

ChatGPT helped me scrape cocktail websites to create a universal drink index.

Gal Bashan
Better Programming

--

This image was generated with DALL-E, obviously.

Lately, I've become a fan of making cocktails. After getting the best equipment and booze money can buy, I realized the leading blocker for me was cocktail recipes. I wanted a platform where I could input the ingredients I have at home, and the output would be a list of cocktails I could make. There are a few apps for that, but they are limited in their variety unless you pay for the app. And since the only thing I like more than cocktails is free cocktails, I was looking for an alternative.

It was about a week after ChatGPT was released, and it had already become my best friend. Could it also be my drinking buddy? I realized many recipes were available online, and I just had to index them correctly. Can ChatGPT build me a tool to index them on my own?

TLDR — yes, here is the GitHub repo. There are even ChatGPT-generated READMEs.

The Plan

I decided to give it a go. I wanted to use ChatGPT to build a simple system to index cocktails from the web. The solution would have two parts:

  1. Crawler — a process that gets a domain to crawl and outputs URLs that appear to be cocktail recipes to a queue. It also follows additional URLs on the page recursively to search for other pages with recipes. This seemed like an easy enough task to let ChatGPT code on its own
  2. Indexer — This component is meant to get a URL, determine if the page contains a cocktail recipe, and store it in a database. The problem is that blogs with cocktail recipes are highly unstructured and have a lot of unnecessary text before reaching the point. Can ChatGPT help me make sense of this mess?

With the plan in place, I set out to start building. I started by laying down the architecture and had to decide what queue to use. ChatGPT came to the rescue:

After reviewing its suggestions, I decided to go with RabbitMQ since I had some experience with it. I asked ChatGPT to lay out the foundations for me:

With the project set up, the next step was for ChatGTP to develop the crawler.

The Crawler

I asked ChatGPT to do the work for me, and boy, did it deliver:

When I ran the program, I came into two issues. First, the visited URLs were not cached, and since almost any page was linked to the home page, we ended up in an infinite loop. I asked ChatGPT to handle that, and it modified the code correctly:

Sadly I cannot recreate this response to catch it in a better resolution.

The second issue was that it started crawling other domains as well — for example, for a page containing a video, the crawler started crawling youtube as well. I asked ChatGTP to fix that as well, and it obliged:

And that was it! The crawler was ready, and the next step was to code the indexer.

Crawler output. There are some mistakes in identifying recipes — the next step will handle that.

The Indexer

As I mentioned before, parsing the content of a web page to identify if it describes a cocktail recipe is challenging. Here is an example of a cocktail recipe web page:

It's almost impossible to handle all the different use cases to parse the page and extract the ingredients. Once again — ChatGPT to the rescue!

I came across an unofficial python implementation of the ChatGPT API and decided to use it in my indexer. The idea was simple, using a well-crafted prompt, I should be able to use ChatGPT to extract the ingredients from the cocktail page. Since ChatGPT has no internet access, it couldn't code this part, but it did help me with the generic components of it. Here is the code I used:

import pika
import time
import sys
import requests
from bs4 import BeautifulSoup
from revChatGPT.revChatGPT import Chatbot
from db import CocktailRecipe, Ingredient, session
import traceback

print(sys.argv)
config = {
"session_token": sys.argv[1],
"cf_clearance": sys.argv[2],
"user_agent": sys.argv[3]
}

chatbot = Chatbot(config, conversation_id=None)

# connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# create a queue to consume messages from, if it does not already exist
channel.queue_declare(queue='drink_urls', durable=True)

QUESTION = """
From the following text below, please understand if it is an article
describing a cocktail recipe. If it is, output the following: The first
line should be: "<cocktail name>: <ranking>". If the ranking doesn't exist
output "-". use only lowercase letters and the generic name of the cocktail
, without mentioning brands. the following lines contain the ingredients:
each line should contain one ingredient and it's amount in the format
"<ingredient>: <amount>". The ingredient part (before the semicolon)
should contain only the ingredient name, not the amount. If it is not a
cocktail recepie, output only the word "no". Output the ingredients in
their generic name, and don't include a brand. For example, instead of
"bacardi white rum" output "white rum". All of the output should be lower
cased, don't capitalize any word. The text is: %s"""

# define a callback function to process incoming messages
def process_message(ch, method, properties, body):
time.sleep(5)
try:
print(body)
page = requests.get(body)
soup = BeautifulSoup(page.content, "html.parser")

response = chatbot.get_chat_response(QUESTION % soup.text, output="text")['message']
if response == "no":
print("not a cocktail")
return
print(response)
except Exception:
print(traceback.format_exc())

# consume messages from the queue, using the callback function to process them
channel.basic_consume(queue='drink_urls', on_message_callback=process_message, auto_ack=True)

# start consuming messages
channel.start_consuming()

This worked like a charm. ChatGPT did an excellent job understanding whether or not a page contained a cocktail recipe, and if it did, the output was almost always in the correct format. I used to think coding with Python was explicit as coding could be, but it was never that close.

Now I just had to store it in a database. I asked ChatGPT to create the database for me:

Then, I gave it a sample input generated by itself and asked it to write code for inserting the information into the database. The initial implementation it provided was using psycopg2. I asked it to use SQLAlchemy as I find it easier to work with an ORM:

Note that ChatGPT coded the parsing logic of its response for me as well

I integrated this code into the indexer and finally had the cocktail database of my dreams!

example output from the indexer process
A view of my cocktail database

Next, I will use ChatGPT to help me create an API and UI to browse the cocktails database and choose the one I want to make today.

Conclusion

This whole project took me roughly 3 hours from end to end. ChatGPT's ability to accelerate my development process blew my mind. The biggest win was the text parsing ChatGPT provided, which I could not do myself, and ChatGPT made simple.

However, the more complex a task got, the chances of ChatGPT performing it correctly decreased drastically. When pairing with ChatGPT, it is still the developer's job to break down the work into small enough tasks for it to swallow, at least for now.

I'm excited to see what comes next!

I asked ChatGPT to rewrite the conclusion. Who did it better?

DrinkIndex part II is out! Check it out here.

--

--

Director of engineering @ Epsagon (acquired by Cisco). Passionate for effective engineering leadership.