Better Programming

Advice for programmers.

Follow publication

Build a Spam Checker With OpenAI and GPT-3

Paulo Taylor
Better Programming
Published in
3 min readNov 7, 2022

The OpenAI API by default gives you different AI models or engines that are suited for different cases.

The fine-tuning feature allows taking OpenAI’s models/engines and supply them with new training data and build a new “fine-tuned” model
It’s this fine-tuning feature that we’ll use to build our spam checker.

But first, to build our fine-tuned model we’ll need some training data.
At Call Assistant we can use the data from existing robocalls that our users have screened.

We’ll need examples of telemarketers and robocalls as well as legitimate calls so that the model can better classify text as spam or not spam

Here’s an example of a telemarketer trying to sell a user some kind of credit solution:

Hi there, this is Sarah again with the credit Pros. I’ve tried calling you a few times with no luck on connecting. We’ve helped hundreds of thousands of people improve their credit, and we’d love to help you as well. So give me a call back at this number as soon as you can. Looking forward to chatting. Thanks.

And here’s an example of a legitimate call

Hi, I'm Spencer from a children's dentist Dr. Porter's office regarding Mr. Smith. He's gonna be due for his hygiene visit next month. Do you like to schedule that appointment? Please? Give us a call at xxx–xxx–xxx. Once again, our phone number is xxx-xxx-xxx. Have a wonderful day. Bye.

We’ll need to add a separator between the prompt and the result. For this example, we’ll use \n\n###\n\n as suggested by OpenAI in its tutorials. Using this data we’ll need to upload a JSONL file with the data formatted as required. Here’s a sample of that. You should add as many examples as possible.

After compiling the file we need to start the fine tuning process. You’ll need a lot of data, the OpenAI CLI and an API key

openai api -k sk-YOUR_KEY fine_tunes.create -t file.jsonl -m ada

It may take some time until the fine-tuning is complete depending on the model and the amount of your training data. To track the progress you can use this command:

openai api -k sk-YOUR_KEY fine_tunes.follow -i ft-aBcDeFgHiJkLmNoP
...
[18:26:40] Fine-tune enqueued. Queue number: 0
[18:26:40] Fine-tune is in the queue. Queue number: 0
[18:29:14] Fine-tune started
[18:30:52] Completed epoch 1/4
[18:32:11] Completed epoch 2/4
[18:33:30] Completed epoch 3/4
[18:34:49] Completed epoch 4/4
[18:35:08] Uploaded model: ada:ft-x-xxxx-xx-xx
[18:35:09] Uploaded result file: file-aBcDeFgHiJ
[18:35:09] Fine-tune succeeded

We now have our new model ready to use. In the following example, I’m using a similar sentence and the engine will classify the text as spam:

openai api -k sk-YOUR_KEY completions.create -m ada:ft-x-xxxx-xx-xx -M 4 -p "Hello, this is John from Finance Plus. I've called before,  We've helped other individuals like you improve their credit. Please give me a call later.###"

The reply would be something like this:

Hello, this is John from Finance Plus. I've called before,  We've helped other individuals like you improve their credit. Please give me a call later.###spam

If you use Java you can try something like this

Performance wise it seems to take about 500–900 milliseconds to execute the completion API but from my experience the more you use it the faster it becomes.

Using this approach with AI and GPT-3 we’re able to scan messages for spam while screening calls and notify our Call Assistant users in real time that they’re in presence of a spam call.

Thanks for reading.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Write a response

How does it do out of the box?

how much data do you recommend for such an implementation?