Generate Conversational Podcasts With GPT-2 and Google WaveNet

Listen to your favorite podcast — forever

Sanjeet Chatterjee
Better Programming
Published in
3 min readAug 11, 2021

--

Logo
Image by author

There have been many generative experiments with GPT-2, ranging from lifelike chatbots to replicating Twitter profiles.

From the OpenAI blog, GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages.

With podcasts being all the rage, what if we could generate lifelike conversations with GPT-2 and text-to-speech?

Here is an example I generated by fine-tuning GPT-2 with transcripts of Elon Musk on the Joe Rogan Experience:

All files for this tutorial can be found here, minus the fine-tuned model.

Preparing the Transcripts

As of writing, Elon Musk has done three episodes on the Joe Rogan Experience — that’s more than eight hours of dialog!

An essential part of fine-tuning is standardising the input text by following a certain format and cleaning up random artifacts. For podcast transcripts, this means clearly separating host and guest dialog as well as removing timestamps.

Since the transcripts were sourced from different websites, there were many inconsistencies. Here are the original transcripts in a single file.

With a bit of Regex magic, we can easily clean it up to have the desired output. Here’s the code:

Generating Conversations With GPT-2

Now it’s time to generate brand new never-before-uttered conversations.

We will be using the gpt-2-simple library for working with GPT-2. I have created a simplified Colab notebook for this tutorial — for a more in-depth discussion, see Max Woolf’s awesome blog post.

Setup

Install required libraries and run imports.

Train

  1. For saving and loading the model, it’s best to mount your drive. Drag the formatted transcripts file into your home Drive folder.
  2. Ensure you have a GPU runtime by Runtime > Change runtime type .
  3. It seems the ‘medium’ 1.5GB model works best for our use case in generating conversational dialog. Due to memory constraints, the larger models cannot be fine-tuned on the Colab free tier. Here’s the code:

Now it’s time to fine-tune GPT-2 — this should take some time, so grab a quick coffee ☕️. Once finished, the model is saved to your Drive for future use. Here’s the code for that task:

Generate

  1. For demonstration purposes, restart your runtime, re-run the imports and load back your model from Drive.
  2. Now it’s time to generate conversations. Feel free to experiment with parameters. Be warned — it can get quite crazy sometimes!

I have found saving files with batches of generated text much easier to work with when cherry-picking conversations for curation.

Adding Voices With Google WaveNet

As of this writing, it still seems Google WaveNet offers the most lifelike voices as an API. You can experiment with different voice parameters here.

  1. pip install --upgrade google-cloud-texttospeech
  2. Set up credentials by following these steps.
  3. Update config.json with required parameters and run speak.py.

And that’s it . Now you have the ability to listen to Elon Musk talk endlessly about whatever is on his mind!

Closing Remarks

This was quite a fun side project. I have generated many interesting conversations and have even published some as podcast episodes:

Elon Tusk — CEO of Tuskla

There’s a lot of room for experimentation with this setup, from different guests for fine-tuning to different voices and even multiple characters conversing in a group.

More recently, OpenAI has unveiled GPT-3 which contains 175 billion parameters — making it 17 times larger than GPT-2! However, as of this writing, it can only be accessed through a private beta API, and generating long-form text may become expensive. So for now, it seems GPT-2 is best for this use case.

Listening to generated conversations between GPT-2 and itself can eerily draw you in. With lifelike voices bringing the illusion alive— what’s next for the future of content generation?

Thanks for reading.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sanjeet Chatterjee
Sanjeet Chatterjee

Written by Sanjeet Chatterjee

Member of the Homo Sapien species.

No responses yet

What are your thoughts?