Automatically Create NBA Highlights With a Few Lines of Python Code

Leveraging open-source computer vision models to generate basketball highlights

Published in

Better Programming

8 min readJul 6, 2022

We’re living in a world of fast consumption of content which is led by the likes of TikTok, Snapchat, Instagram, Twitter, Facebook, Youtube, and more.

Younger fans are embracing new ways of engaging with leagues with decreasing importance placed on watching games live.

Among U.S. sports fans ages 18–34, 58% of MLB fans, 54% of NBA fans, and 48% of NFL fans say they prefer watching highlights over full games, according to a survey by Variety Intelligence Platform (full article).

SOURCE: MARU GROUP FOR VARIETY INTELLIGENCE PLATFORM

Being a die-hard NBA fan living halfway around the world from the US where these games take place in the middle of the night, there is no bigger consumer of these highlights than I.

Needing no further motivation, I set out to create an automated process to create highlights of NBA games.

What

The goal of this project was to create these highlights using completely open source technologies and keep it as simple as possible.

How

There are many ways to try and find points of interest in games — sound analysis, motion detection, etc. There are companies that make a business of exactly this, using complex models to identify points of interest and create highlight clips. But these complex inputs and models do not necessarily equal more accurate results. Instead, I decided to rely on a steady and clear signal — a timestamped play-by-play account of the game matched with the game clock. So simple — yet so accurate. While this solution will not necessarily work in any sport, in the NBA (and basketball in general) the clock is sacred… so the reliability is very high with little to no effort, and the level of calculation is very low.

There are three parts to this solution:

Building a simple play-by-play scraper to get the data of the game of which we are creating a highlight.
Using the open-source Tesseract OCR model to find the current game clock and quarter in the game film.
Comparing the clock and quarter we extracted from the frames with our play-by-play data.

…and if we have a match — voilà! We have a highlight!

Now let’s get to the technicalities.

This project was written in python but can easily be replicated in any language.
Before we get started, these are the libraries that were used in this project:

import pandas as pd
import pytesseract
import cv2
from moviepy.editor import *
import json
import requests

Building a play-by-play scraper

Play-by-play provides a transcript of the game in a format of individual events.

Some examples of the data that can be found in the play-by-play data:

time of possession (on the game clock)
quarter in which possession occurred
the player who initiated possession (in the case of a steal or defensive rebound)
the opposing player who initiated possession (in case of a missed shot or turnover) including the location on the floor the shot was taken from, and some other unique identifiers we use to classify the type of possession.

The data we are scraping is from the official NBA APIs (data.nba.net and cdn.nba.com). The data is stored in JSON format. To get the data in a more traditional format, like a dataframe, we just need to run a few lines of code.

The first part provides the games that were played on a given date:

jsn = f"https://data.nba.net/10s/prod/v1/{date}/scoreboard.json"
page = requests.get(jsn)
j = json.loads(page.content)

Next, we check that the inputted teams did indeed play on the given date. If they did, we take the game ID and insert it into the next API.

We extract the data and load it into a pandas dataframe:

raw_game = f'https://cdn.nba.com/static/json/liveData/playbyplay/playbyplay_{game_id}.json'
page = requests.get(raw_game)
j = json.loads(page.content)
df = pd.DataFrame(j['game']['actions'])

The results look something like this:

Pandas dataframe containing the play-by-play data

From here we can filter any type of plays that interest us (baskets, steals, blocks, etc…). In our case, I’m taking only made baskets and excluding free throws.

ndf = df[['clock', 'period', 'description', 'teamTricode', 'shotResult', 'actionType']]
ndf = ndf[ndf['shotResult'] == 'Made']
ndf = ndf[ndf['actionType'] != 'freethrow']

Now that we have a dataset of all points of interest in the game, it’s time to find them in the actual game footage.

Processing the game film

The next step is to process the game film into frames and manipulate them using the OpenCV module.

cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()
while cap.isOpened():
…

Each frame will then go through a series of manipulations (pre-processing) to prepare it for OCR (more on OCR coming up).

The steps:

Crop the original frame and only leave the bottom third, where the game clock is located (this is to not waste time reading the entire frame).
Convert the image to grayscale.
Convert the grayscale image to black & white.
Apply Gaussian blur to the black & white image.

height, width = frame.shape
crop_img = frame[height-(height/3):height, 0:width]
gray = cv2.cvtColor(crop_img, cv2.COLOR_BGR2GRAY)
bw = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
blurred = cv2.GaussianBlur(bw, (5, 5), 0)

Tesseract

Tesseract — is an optical character recognition engine with open-source code. This is the most popular and qualitative OCR-library.

OCR uses artificial intelligence for text search and recognition in images.

Tesseract finds templates in pixels, letters, words, and sentences.

It has the ability to recognize more than 100 languages out of the box and can be trained to recognize other languages.

So basically the engine searches for and extracts text from images.

In this project, we are using python-tesseract (pytesseract) which is a python wrapper for Google’s Tesseract-OCR engine.

There are 13 tesseract configuration modes. We will use mode 11: Sparse text. Find as much text as possible in no particular order.

Tesseract implementation

The next step is to run our processed frame through the tesseract engine.

data = pytesseract.image_to_string(blurred, lang='eng', config=' - psm 11')

Now that we have all of the text in the frame, we can search for the information that we want — i.e the current quarter and the game clock.

Continuing with the example from above:

The quarter is “1st” and the game clock is “7:32”. However, in the Tesseract output, the detection is “ist”. The number “1” was mistaken for the letter “i”.

So the next step is to create a mapping of the common mistakes for each quarter.

firstQ  = ['1st', 'ist', 'ast']
secondQ = ['2nd', '2n', 'znd']
thirdQ  = ['3rd', '3r', '3r0', '3ro', '37d', '3fd', '31d']
fourth  = ['4th', '4t', '47h', '41h', '4h']

Matching the play-by-play data with the data from the game footage

Let’s summarize what we have so far: we built a play-by-play scraper, manipulated game footage frames, and ran them through the Tesseract engine to extract the text from the frame.

Now it’s time to combine the data points and find the highlights.

We do this by processing each frame of the game footage as discussed above and checking if the extracted combination of quarter and game clock matches the quarter and game clock of a highlight.

If we have a match then we save the frame number.

curr_frame = cap.get(cv2.CAP_PROP_POS_FRAMES)
frame_loc.append(curr_frame)

Piecing together the frames

Once we’re finished going through all of the frames in the game and we have a list of all of the frames where highlights occurred, all we have to do is piece them together.

We will do it using the moviepy module.

Moviepy allows us to crop subclips out of full videos. In order to see the entire play that led up to the highlight, we take a few seconds before and a couple of seconds after the exact second in which the highlight occurred.

clips = []
for frame in highlight_frames:
clip_name = video.subclip(round(frame/fps) - 4, round(frame/fps)+ 2)
clips.append(clip_name)
final_clip = concatenate_videoclips(clips)
final_clip.write_videofile(output_path,
                           codec='libx264',
                           audio_codec='aac',
                           fps=fps)

Example of output highlights

Now that we’re done creating the highlights, let’s see the result….

All logos and footage are property of the NBA and its affiliates

Improving efficiency

This is a basic script for finding points of interest and creating highlight clips, but this is far from efficient.

The efficiency can easily be improved in a variety of different ways, to list a few:

Not checking every frame — jumping one second instead of checking each frame reduces the number of frames processed in a 2-hour game (of 60 fps) from 430,000 to 7,200.
Locating the coordinates of the game clock and running only that area through the OCR engine will reduce the running time.
Look for changes in the score instead of the game clock (if you only want baskets scored)
Using multiprocessing to work in parallel and speed up the process, and using a GPU.

To sum it up

This is just a small example of how open-source technologies can be utilized to create incredible products.

That’s it for now — I hope you found it interesting, and please let me know if you have any questions or comments!