From Hours to Seconds: A Tale of SQL Query Optimization
An example of a query optimization technique

It was Monday noon. When a new feature was released to production, I knew it would result in a large number of new records being added to the table: movies
. But I had no idea what was going to happen next.
The problem
I monitored one of our customer’s logs and waited for a major flow to complete before confirming that the release had not broken anything. As time passed, the flow failed to complete, and I realized something had gone wrong.
Because it was a critical process and a customer was waiting on the other side, I had to fix it right away. During a debugging session, I discovered that one of our database queries had gotten significantly slower. The operation took several hours to finish.
The “Naughty” Query
SELECT
title,
director,
time
FROM movies
WHERE
idStudio= 10 AND
genre IN ('Crime Thriller', 'Spy Thriller', 'Legal Thriller') AND
time > '1970-01-01'
ORDER BY time ASC
LIMIT 10000
It’s worth noting the following about the movies
table:
- It’s an InnoDB table.
- The primary key of the table is
id
, which is an Auto Increment column, that is used to give each row a unique identity. - Two of the table’s indexes are:
a.KEY `idx_time` (`time`)
b.KEY `idx_idStudio_genre_time` (`idStudio`,`genre `,`time`)
Notice that:
- The index
idx_idStudio_genre_time
covers the columns used in the “naughty” query'sWHERE
clause. - There are 2 columns (
title
,director
) in theSELECT
clause that are not covered by any of the table’s indexes.
Disclaimer
As you may have guessed, I’m not in the film industry, therefore the movies table is made up, but the table indexes and query structure are based on the original query.
The Road Not Taken — Altering the Table
Initially, my approach was to change the structure of the movies
table (and therefore modifying the query).
I used the explain
statement to gather information about how MySQL optimizer performs the query, and it appeared that the MySQL optimizer did not choose the index I expected. I anticipated the idx_idStudio_genre_time
index to be chosen, but instead, the idx_time
index was chosen.
In addition, we selected two columns (title
and director
) in the query’s SELECT
clause, that were not covered by any index. In that scenario, MySQL iterates over the entire table to retrieve the values for the columns that aren't in the index. This could cause a performance issue.
Moreover, the query has two range conditions: genre IN (...) AND time > '1970-01-01'
. Once the first range condition is met, any subsequent conditions in the query will not use any further indexes. The query performance is unaffected by having both genre
and time
in our index. The index simply takes up additional storage space.
I meant to improve the table indexes and add columns to the table, to address these issues and others. For example, the genres “Crime Thriller,” “Spy Thriller,” and “Legal Thriller” all belong to the same category: “Thriller”. By adding a category
column to the table and indexing it with INDEX(`idStudio`,`category`,`time`)
, we can update the query to use category
instead of genre
, to avoid using a second range condition.
My goal was to improve query performance, however, the approach I came up with had two main drawbacks:
- First, indexes have a significant overhead cost. Writing on the indexed table becomes slower as you add more indexes.
- Second, the table was huge so it took hours to alter it. In order to test my theory, I duplicated our database (to mimic production size tables) and added one column to the test database’s
movies
table. Because the table was so large, the migration took 7 hours.
At this time, it was clear that this was not the path we wanted to take. Changing the movies
table had wider implications and took a long time, and we faced an urgent issue.
The Road Taken — Rewriting the Query
An “only code” solution is the quickest strategy. We can run it manually on the database to get an idea of the query’s performance right away, then merge it into production and test it for all of our customers. If it doesn’t work, we can rollback to the previous query with a single click.
Although it seemed like a long shot, we chose to update the query to leverage the table’s existing index, idx_idStudio_genre_time
, and to reduce the dataset from which the query selects the title
, director
and time
columns.
The effect of rewriting the query, to produce an identical result as the original query, was surprising. The query was no longer performed in hours but in seconds.
The Rewritten Query
SELECT
title,
director,
time
FROM movies
JOIN (
SELECT
id
FROM movies
WHERE
idStudio= 10 AND
genre IN ('Crime Thriller', 'Spy Thriller', 'Legal Thriller')AND
time > '1970-01-01'
ORDER BY time ASC
LIMIT 10000
) innerTable
ON movies.id = innerTable.id
Take note of the following:
- The
JOIN
clause query (the “subquery”) is identical to the “naughty” query. Except that theid
column, which is themovies
table primary key, is selected. - The table produced from the “subquery” is called
innerTable
. - Table
movies
is joined with tableinnerTable
. The columns selected in the “naughty” query, are being selected from the joined tables in the rewritten query.
What’s the Magic?
Hooray! The issue has been resolved. Now, the query returns a result in seconds, the flow is functional for our customer, and everyone is pleased.
But that bothered me… why is it that the query performs so much better? The two queries provide the same output, but their performance on large datasets is substantially different… what’s the trick?
You might be surprised to learn that MySQL indexes the primary key automatically. It means that all of the movies
table indexes include the table's id
column.
idx_idStudio_genre_time
is now the covering index for the “subquery”. This is an index that provides all the necessary results for a query. Since the “subquery” only uses columns that are part of that index, the selected value (id
) can be fetched from the index tree quickly. Unlike the "naughty" query, which required additional table lookup in order to retrieve all of the desired columns.
The PRIMARY KEY
index is used to join the movies
and theinnerTable
tables. The tables are joined using the eq_ref
join type, as described in the explain
output. This is one of "the best possible join types. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY
".
Using (INNER) JOIN
in the query resulted in a significant reduction in the size of the dataset from which we select values. When the dataset was smaller, selecting the title
, director
and time
columns was relatively quick.
Takeaways
- Make a “minimal viable change”. When we spot a problem, we may be tempted to fix all of the problematic aspects, but focusing on the smallest testable solution, that provides value to our customers, allows us to move faster and deliver quick solutions.
- Reduce the number of records in your query’s FROM clause. If you’re having trouble querying large tables, try making the dataset, you’re selecting values from, smaller. Use, for example, the trick I mentioned in this blog.
- Make the most of what you have. Use your current table indexes to your advantage, learn about your database’s “free” capabilities, and put them to creative use.
- Collaborate with your teammates to come up with solutions. Brainstorming is a valuable tool for creating new ideas. This is especially true when your team and you need to think of a new solution.
- Share what you’ve learned. Communicate your new knowledge and insights with others so they can benefit from it as well.
Thanks for reading. I hope this was helpful!
Special thanks to Lital Weiss and Tal Bereznitskey, my colleagues in Torii, for your thoughtful remarks and brainstorming sessions on this problem.