Member-only story
Analyzing Financial Reports With LlamaIndex and OpenAI
How LlamaIndex’s SubQuestionQueryEngine simplifies complex queries

Following up on our last article, which focused on querying both structured and unstructured data using SQLAutoVectorQueryEngine
. Let’s explore LlamaIndex’s SubQuestionQueryEngine
, which handles complex queries, in this article.
First, let’s look at our use case of analyzing the financial reports of the United States government for the most recent fiscal years of 2021 and 2022. These findings are based on facts, on the most recent data available, and are apolitical.
Use Case of U.S. Government’s Financial Reports
The financial reports of the United States government are freely downloadable from the website of the Bureau of the Fiscal Service. The financial report provides the President, Congress, and the American people with a comprehensive view of the federal government’s finances.
For our demo app, we will download the executive summaries of the U.S. government’s financial reports for fiscal years 2021 and 2022. We will use these two summary reports as our data sources and ask questions about the content of these two reports.
I selected the executive summaries of those financial reports (about ten pages long for each summary report) as our data sources, partly to save the token cost, as the original full reports are about 200–300 pages long. Please download the full reports if you don’t mind spending a little on the tokens.
SubQuestionQueryEngine
SubQuestionQueryEngine
is designed by LlamaIndex to break down a complex query (e.g., compare and contrast) into many sub-questions and their target query engine for execution. After executing all sub-questions, all responses are gathered and sent to the response synthesizer to produce the final response.
Given our use case, we come up with the following high-level architectural diagram of how SubQuestionQueryEngine
tackles complex questions related to the U.S. government's financial reports.