Measuring and improving LLM accuracy for structured outputs
Natural Language Processing (NLP) is often* considered to be the combination of two branches of study – Natural Language Understanding (NLU) and Natural Language Generation (NLG). *For example, that is how Ines Montani, co-founder of spaCy recently described the fields in a podcast interview. Large Language Models can do both …
Overview
Natural Language Processing (NLP) is often* considered to be the combination of two branches of study – Natural Language Understanding (NLU) and Natural Language Generation (NLG).
*For example, that is how Ines Montani, co-founder of spaCy recently described the fields in a podcast interview.
Large Language Models can do both NLU and NLG. In this course we are primarily interested in the NLU aspect – more specifically we are interested in how to extract structured information from free form text. (There is also an NLG aspect to the course which you will notice as you watch the video lessons).
Recently both GPT and Gemini introduced the ability to extract structured output from the prompt text. As of this writing (November 2024), they are the only LLMs which provide native support for this feature via their API itself – in other words, you can simply specify the response schema as a Python class, and the LLMs will give you a “best effort” response which is guaranteed to follow the schema. It is best effort because while the response is guaranteed to follow the schema, sometimes the fields are empty.
This course provides a practical and systematic approach for assessing the accuracy of LLM Structured Output responses.