Fixing YouTube Search with OpenAI’s Whisper

How to use OpenAI’s Whisper for better speech-enabled (audio) search

James Briggs

Published in

Better Programming

7 min readNov 16, 2022

Electrified search. This image and all others in the article are by the author.

OpenAI’s Whisper is a new state-of-the-art (SotA) model in speech-to-text. It can almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise.

The domain of spoken word has always been somewhat out of reach for ML use cases. Whisper changes that for speech-centric use cases. We will demonstrate the power of Whisper alongside other technologies like transformers and vector search by building a new and improved YouTube search.

Search on YouTube is good but has its limitations, especially when it comes to answering questions. With trillions of hours of content, there should be an answer to almost every question.

Yet, if we have a specific question like “what is OpenAI’s CLIP?” instead of a concise answer, we get lots of very long videos that we must watch through.

What if all we want is a short 20-second explanation? The current YouTube search has no solution for this. Maybe there’s a good reason to encourage users to watch as much of a video as possible (more ads, anyone?).

Whisper is the solution to this problem and many others involving the spoken word. This article will explore the idea behind a better speech-enabled search.

The Idea

We want to get specific timestamps that answer our search queries. YouTube does support time-specific links in videos, so a more precise search with these links should be possible.

Timestamp URLs can be copied directly from a video, we can use the same URL format in our search app.

To build something like this, we first need to transcribe the audio in our videos to text. YouTube automatically captions every video, and the captions are okay — but OpenAI just open-sourced something called “Whisper”.

Better Programming

Fixing YouTube Search with OpenAI’s Whisper

How to use OpenAI’s Whisper for better speech-enabled (audio) search

The Idea

Create an account to read the full story.

Published in Better Programming

Written by James Briggs

Responses (2)