Better Programming

Advice for programmers.

Follow publication

You're unable to read via this Friend Link since it's expired. Learn more

Member-only story

Building Production-Ready LLM Apps with LlamaIndex: Document Metadata for Higher Accuracy Retrieval

Wenqi Glantz
Better Programming
Published in
8 min readAug 17, 2023
Image by Craig Glantz from Canva

We’ve developed quite a few POC LLM apps with a variety of stacks so far, mainly centered around LlamaIndex and its eco system. From this article onward, I hope to explore different areas which make up the production-ready LLM apps with LlamaIndex. Let’s start with document metadata.

Document Metadata

Document metadata is data about data. It is descriptive information about your documents, such as document title, keywords, summary, etc. Document metadata enriches the nodes with additional information that can be used during the retrieval process. Let’s explore how metadata can achieve higher retrieval accuracy in RAG.

Use Case

Based on one of our previous stories, Analyzing Financial Reports With LlamaIndex and OpenAI, let’s expand the use case to load both financial reports for the US government for fiscal years 2022 and 2021 and then ask questions on the financial report for a particular fiscal year. This use case is slightly different from our previous article mentioned above in that we are not doing compare and contrast but just straight Q&A on the report for a particular year, a very common RAG use case.

We are going to illustrate the query results for the same set of questions for both scenarios: with metadata and without metadata. Let’s observe the differences in the responses.

How To Add Metadata for Your Documents

You may think it must be a daunting task to add metadata to your documents. Luckily, LlamaIndex recently introduced MetadataExtractor module. Metadata extractor uses LLMs to extract certain contextual information relevant to the document, store it in each node, to better help the retrieval and language models disambiguate similar-looking passages.

Simply put, metadata extractor uses LLMs to auto-generate metadata for your documents, which helps LLMs achieve more accurate retrieval. Brilliant!

Out of the box, the Metadata extractor module comes with a list of extractors:

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Wenqi Glantz
Wenqi Glantz

Written by Wenqi Glantz

Mom, wife, architect with a passion for technology and crafting quality products linkedin.com/in/wenqi-glantz-b5448a5a/ twitter.com/wenqi_glantz

No responses yet

Write a response