Build a Spam Checker With OpenAI and GPT-3
A simple tutorial to create a spam filter using the fine-tuned API

The OpenAI API by default gives you different AI models or engines that are suited for different cases.
The fine-tuning feature allows taking OpenAI’s models/engines and supply them with new training data and build a new “fine-tuned” model
It’s this fine-tuning feature that we’ll use to build our spam checker.
But first, to build our fine-tuned model we’ll need some training data.
At Call Assistant we can use the data from existing robocalls that our users have screened.
We’ll need examples of telemarketers and robocalls as well as legitimate calls so that the model can better classify text as spam or not spam
Here’s an example of a telemarketer trying to sell a user some kind of credit solution:
Hi there, this is Sarah again with the credit Pros. I’ve tried calling you a few times with no luck on connecting. We’ve helped hundreds of thousands of people improve their credit, and we’d love to help you as well. So give me a call back at this number as soon as you can. Looking forward to chatting. Thanks.
And here’s an example of a legitimate call
Hi, I'm Spencer from a children's dentist Dr. Porter's office regarding Mr. Smith. He's gonna be due for his hygiene visit next month. Do you like to schedule that appointment? Please? Give us a call at xxx–xxx–xxx. Once again, our phone number is xxx-xxx-xxx. Have a wonderful day. Bye.
We’ll need to add a separator between the prompt and the result. For this example, we’ll use \n\n###\n\n
as suggested by OpenAI in its tutorials. Using this data we’ll need to upload a JSONL
file with the data formatted as required. Here’s a sample of that. You should add as many examples as possible.
After compiling the file we need to start the fine tuning process. You’ll need a lot of data, the OpenAI CLI and an API key
openai api -k sk-YOUR_KEY fine_tunes.create -t file.jsonl -m ada
It may take some time until the fine-tuning is complete depending on the model and the amount of your training data. To track the progress you can use this command:
openai api -k sk-YOUR_KEY fine_tunes.follow -i ft-aBcDeFgHiJkLmNoP
...
[18:26:40] Fine-tune enqueued. Queue number: 0
[18:26:40] Fine-tune is in the queue. Queue number: 0
[18:29:14] Fine-tune started
[18:30:52] Completed epoch 1/4
[18:32:11] Completed epoch 2/4
[18:33:30] Completed epoch 3/4
[18:34:49] Completed epoch 4/4
[18:35:08] Uploaded model: ada:ft-x-xxxx-xx-xx
[18:35:09] Uploaded result file: file-aBcDeFgHiJ
[18:35:09] Fine-tune succeeded
We now have our new model ready to use. In the following example, I’m using a similar sentence and the engine will classify the text as spam
:
openai api -k sk-YOUR_KEY completions.create -m ada:ft-x-xxxx-xx-xx -M 4 -p "Hello, this is John from Finance Plus. I've called before, We've helped other individuals like you improve their credit. Please give me a call later.###"
The reply would be something like this:
Hello, this is John from Finance Plus. I've called before, We've helped other individuals like you improve their credit. Please give me a call later.###spam
If you use Java you can try something like this
Performance wise it seems to take about 500–900 milliseconds to execute the completion API but from my experience the more you use it the faster it becomes.
Using this approach with AI and GPT-3 we’re able to scan messages for spam while screening calls and notify our Call Assistant users in real time that they’re in presence of a spam call.
Thanks for reading.