Pydantic¶

Introduction to Pydantic¶

Pydantic is a data validation library in python. Suppose we get some user attributes. This data could have come from anywhere - user input, an API call, whatever. Suppose for now that this is some information about a user that the user themselves has provided via some form.

In [55]:

Copied!

from rich.pretty import pprint
from rich.pretty import pprint

In [56]:

Copied!





user_attributes = {
    "name": "Keanu Reeves",
    "age": 873, # because we all know Keanu Reeves is immortal
    "email": "jwick@email.com",
    "pets": ["dog"]
}
user_attributes = {
    "name": "Keanu Reeves",
    "age": 873, # because we all know Keanu Reeves is immortal
    "email": "jwick@email.com",
    "pets": ["dog"]
}

we might want this input to always have the following form:

user_attributes = {
    "name" : str,
    "age" : int,
    "email" : str,
    "pets" : list[str],
}

If you've taken an intro level python course, you might have been introduced to objects by creating classes such as people or cars with attributes and methods. So, suppose we want to take this dictionary that we got from the user input and create a class out of it.

Using dataclasses¶

One way to turn our user input into an object is to use a dataclass.

In [57]:

Copied!





from dataclasses import dataclass

@dataclass
class User:
    name: str
    age: int
    email: str
    pets: list[str]
from dataclasses import dataclass

@dataclass
class User:
    name: str
    age: int
    email: str
    pets: list[str]

We can now construct a new user, by passing in the dictionary that we got from the user input.

In [58]:

Copied!

new_user = User(**user_attributes)
print(type(new_user))
pprint(new_user, expand_all=True)
new_user = User(**user_attributes)
print(type(new_user))
pprint(new_user, expand_all=True)

<class '__main__.User'>

User(
│   name='Keanu Reeves',
│   age=873,
│   email='jwick@email.com',
│   pets=[
│   │   'dog'
│   ]
)

We might also want to do some checks on our user input. For example, we might want to make sure that the user is over 18 years old. We can do this by defining a function that checks the age.

In [60]:

Copied!

def is_over_18(user: dict) -> bool:
    return user.age >= 18

is_over_18(new_user)
def is_over_18(user: dict) -> bool:
    return user.age >= 18

is_over_18(new_user)

Out[60]:

True

But now what happens if one of the fields is the incorrect type?

In [18]:

Copied!

user_attributes["age"] = "873"

wrong_user = User(**user_attributes)

is_over_18(wrong_user)
user_attributes["age"] = "873"

wrong_user = User(**user_attributes)

is_over_18(wrong_user)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 5
      1 user_attributes["age"] = "873"
      3 wrong_user = User(**user_attributes)
----> 5 is_over_18(wrong_user)

Cell In[2], line 2, in is_over_18(user)
      1 def is_over_18(user: dict) -> bool:
----> 2     return user.age >= 18

TypeError: '>=' not supported between instances of 'str' and 'int'

We're being told that you can't compare strings and integers (obviously).

Now, we could write some manual checks for typing in our age-checking function, but ideally we would like to keep all of the validation stuff in one place.

This is where Pydantic comes in.

Using Pydantic¶

In [19]:

Copied!

from pydantic import BaseModel, Field
from pydantic import BaseModel, Field

Creating a Pydantic object is similar to creating a dataclass. We use the BaseModel class, and pass in the fields we want using the Field class.

In [26]:

Copied!





class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user")
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Pets", description="Pets of the user")
class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user")
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Pets", description="Pets of the user")

Pydantic will now automatically check that the data we pass in is of the correct type.

In [27]:

Copied!

user = User(**user_attributes)
pprint(user, expand_all=True)

is_over_18(user)
user = User(**user_attributes)
pprint(user, expand_all=True)

is_over_18(user)

User(
│   name='Keanu Reeves',
│   age=873,
│   email='jwick@email.com',
│   pets=[
│   │   'dog'
│   ]
)

Out[27]:

True

Even though we passed a string for age, Pydantic will try to convert it for us if possible. But what if we pass through a preposterous value for age?

In [29]:

Copied!





user_attributes["age"] = "panda"

try:
    user = User(**user_attributes)
except Exception as e:
    print(e)
user_attributes["age"] = "panda"

try:
    user = User(**user_attributes)
except Exception as e:
    print(e)

1 validation error for User
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='panda', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/int_parsing

It makes no sense for someone to be panda years old, so Pydantic will not let us pass this in.

Another handy feature is that we can build in additional validation checks.

In [30]:

Copied!





class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user", ge=18)
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Pets", description="Pets of the user")
class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user", ge=18)
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Pets", description="Pets of the user")

We have passed in the extra argument ge=18 to the Field class. This means that the age must be greater than or equal to 18. If we pass in an age less than 18, Pydantic will raise an error:

In [32]:

Copied!





from pydantic import ValidationError

user_attributes["age"] = 17

try:
    user = User(**user_attributes)
    pprint(user)
except ValidationError as e:
    pprint(e.errors())
from pydantic import ValidationError

user_attributes["age"] = 17

try:
    user = User(**user_attributes)
    pprint(user)
except ValidationError as e:
    pprint(e.errors())

[
│   {
│   │   'type': 'greater_than_equal',
│   │   'loc': ('age',),
│   │   'msg': 'Input should be greater than or equal to 18',
│   │   'input': 17,
│   │   'ctx': {'ge': 18},
│   │   'url': 'https://errors.pydantic.dev/2.9/v/greater_than_equal'
│   }
]

This is handy, because it has told us what failure was, what we input, and what control we put in place to prevent it.

We can also define our own custom validators. Suppose we want to make sure that the email address is a valid email address. We can do this by defining a custom validator.

In [33]:

Copied!





from pydantic import field_validator
from pydantic_core import PydanticCustomError

class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user", ge=18)
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Summary", description="Pets of the user")

    @field_validator("email")
    def check_email(cls, v):
        if "@" not in v or "." not in v:
            raise PydanticCustomError(
                'InvalidEmail',
                'Email must contain "@" and "."'
            )
        return v
from pydantic import field_validator
from pydantic_core import PydanticCustomError

class User(BaseModel):
    name: str = Field(..., title="Name", description="Name of the user")
    age: int = Field(..., title="Age", description="Age of the user", ge=18)
    email: str = Field(..., title="Email", description="Email of the user")
    pets: list[str] = Field(..., title="Summary", description="Pets of the user")

    @field_validator("email")
    def check_email(cls, v):
        if "@" not in v or "." not in v:
            raise PydanticCustomError(
                'InvalidEmail',
                'Email must contain "@" and "."'
            )
        return v

In [35]:

Copied!





user_attributes = {
    "name": "Keanu Reeves",
    "age": 873,
    "email": "jwick-at-email-dot-com",
    "pets": ["dog"]
}

try:
    user = User(**user_attributes)
    pprint(user)
except ValidationError as e:
    pprint(e.errors(), expand_all=True)
user_attributes = {
    "name": "Keanu Reeves",
    "age": 873,
    "email": "jwick-at-email-dot-com",
    "pets": ["dog"]
}

try:
    user = User(**user_attributes)
    pprint(user)
except ValidationError as e:
    pprint(e.errors(), expand_all=True)

[
│   {
│   │   'type': 'InvalidEmail',
│   │   'loc': (
│   │   │   'email',
│   │   ),
│   │   'msg': 'Email must contain "@" and "."',
│   │   'input': 'jwick-at-email-dot-com'
│   }
]

Application to LLMs¶

An important application of Pydantic is in validating the output of LLMs. For example, suppose we have the following description:

In [36]:

Copied!





description = (
    "My name is Ryan, and I am 35 years old. "
    "During the weekends I like to hike, but I also enjoy playing video games. "
    "It can sometimes be difficult to use my computer, "
    "because my cat likes to sleep on the keyboard! "
    "During the week, I work as a MLE at the University of Cambridge. "
    "Although I really enjoy living in the UK, "
    "I miss the outdoors back home in NZ."
)
description = (
    "My name is Ryan, and I am 35 years old. "
    "During the weekends I like to hike, but I also enjoy playing video games. "
    "It can sometimes be difficult to use my computer, "
    "because my cat likes to sleep on the keyboard! "
    "During the week, I work as a MLE at the University of Cambridge. "
    "Although I really enjoy living in the UK, "
    "I miss the outdoors back home in NZ."
)

We can now use a system prompt to ask the LLM to extract the information we want.

In [38]:

Copied!





system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n\n"
    "Here is a description:\n\n"
)
system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n\n"
    "Here is a description:\n\n"
)

This information is clearly contained within the text, and any human with basic comprehension skills can extract or infer it.

Let's first try using an LLM to extract this information, probably to how we might prompt ChatGPT:

In [40]:

Copied!





from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": description},
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": description},
    ],
    temperature=0.0,
)

print(response.choices[0].message.content)

- Name: Ryan
- Age: 35
- Nationality: New Zealander (NZ)
- Occupation: Machine Learning Engineer (MLE) at the University of Cambridge
- A list of any pets: Cat
- A list of any hobbies: Hiking, playing video games

This might initially look good, but we can't really do anything useful with this information, since it's still unstructure. It looks structured, but parsing this would be annoying. I could write something that looked for the colon and then took the text afterwards; but then what about the lists of items? And any text in brackets? Or any extraneous information or text?

Instead, I can try to get the output in a structured format.

First we define a new Pydantic class that we want the output to be.

In [41]:

Copied!





class Person(BaseModel):
    name: str | None = Field(..., description="The name of the person")
    age: int | None = Field(..., description="The age of the person")
    nationality: str | None = Field(..., description="The nationality of the person")
    occupation: str | None = Field(..., description="The occupation of the person")
    pets: list[str] | None = Field(..., description="The pets of the person")
    hobbies: list[str] | None = Field(..., description="The hobbies of the person")

    # print the description of the person
    def __str__(self) -> str:
        output = f"Name: {self.name}\n"
        output += f"Age: {self.age}\n"
        output += f"Nationality: {self.nationality}\n"
        output += f"Occupation: {self.occupation}\n"
        output += f"Pets: {self.pets}\n"
        output += f"Hobbies: {self.hobbies}\n"
        return output
class Person(BaseModel):
    name: str | None = Field(..., description="The name of the person")
    age: int | None = Field(..., description="The age of the person")
    nationality: str | None = Field(..., description="The nationality of the person")
    occupation: str | None = Field(..., description="The occupation of the person")
    pets: list[str] | None = Field(..., description="The pets of the person")
    hobbies: list[str] | None = Field(..., description="The hobbies of the person")

    # print the description of the person
    def __str__(self) -> str:
        output = f"Name: {self.name}\n"
        output += f"Age: {self.age}\n"
        output += f"Nationality: {self.nationality}\n"
        output += f"Occupation: {self.occupation}\n"
        output += f"Pets: {self.pets}\n"
        output += f"Hobbies: {self.hobbies}\n"
        return output

Now we explicitely ask the LLM to output JSON.

In [42]:

Copied!





system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format.\n\n" # <--- JSON please!
    "Here is a description:\n\n"
)
system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format.\n\n" # <--- JSON please!
    "Here is a description:\n\n"
)

In [43]:

Copied!





response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
)

print(response.choices[0].message.content)
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
)

print(response.choices[0].message.content)

```json
{
  "Name": "Ryan",
  "Age": 35,
  "Nationality": "New Zealander",
  "Occupation": "Machine Learning Engineer",
  "Pets": ["cat"],
  "Hobbies": ["hiking", "playing video games"]
}
```

Great! We have valid JSON! Except we don't, we have those annoying json tags. OK, so now I can ask the LLM to not include those.

In [44]:

Copied!





system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format. "
    "Do not include the `json` tags.\n\n" # <--- no tags please!
    "Here is a description:\n\n"
)
system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format. "
    "Do not include the `json` tags.\n\n" # <--- no tags please!
    "Here is a description:\n\n"
)

In [45]:

Copied!





response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
)

print(response.choices[0].message.content)
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
)

print(response.choices[0].message.content)

{
  "Name": "Ryan",
  "Age": 35,
  "Nationality": "New Zealander",
  "Occupation": "Machine Learning Engineer",
  "Pets": ["cat"],
  "Hobbies": ["hiking", "playing video games"]
}

That actually worked, and this is valid JSON. This is what you might have to to do with many LLMs out there. However, OpenAI have gone to the effort of adding special response_format arguments to their API to make this easier. So I can feed in the original prompt that I had, and specify that I want the output in JSON format. And this is will guarantee that the output is valid JSON.

In [46]:

Copied!





system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format.\n\n" # <--- JSON please!
    "Here is a description:\n\n"
)
system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n"
    "Return the information in JSON format.\n\n" # <--- JSON please!
    "Here is a description:\n\n"
)

In [47]:

Copied!





response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
  response_format={"type": "json_object"} # <--- JSON please!
)

print(response.choices[0].message.content)
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  temperature=0.0,
  response_format={"type": "json_object"} # <--- JSON please!
)

print(response.choices[0].message.content)

{
  "Name": "Ryan",
  "Age": 35,
  "Nationality": "New Zealander",
  "Occupation": "Machine Learning Engineer",
  "Pets": ["cat"],
  "Hobbies": ["hiking", "playing video games"]
}

Why did we go to all this effort? Well, now we can try and pass this object in as arguments to our Person class, just as we did before with Keanu. This process is called deserialisation - the process of converting JSON into a Pydantic object.

In [48]:

Copied!

import json

json_content = json.loads(response.choices[0].message.content)
person = Person(**json_content)

pprint(person, expand_all=True)
import json

json_content = json.loads(response.choices[0].message.content)
person = Person(**json_content)

pprint(person, expand_all=True)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[48], line 4
      1 import json
      3 json_content = json.loads(response.choices[0].message.content)
----> 4 person = Person(**json_content)
      6 pprint(person, expand_all=True)

File ~/Website/large-language-models/venv/lib/python3.11/site-packages/pydantic/main.py:209, in BaseModel.__init__(self, **data)
    207 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    208 __tracebackhide__ = True
--> 209 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    210 if self is not validated_self:
    211     warnings.warn(
    212         'A custom validator is returning a value other than `self`.\n'
    213         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    214         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    215         category=None,
    216     )

ValidationError: 6 validation errors for Person
name
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
age
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
nationality
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
occupation
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
pets
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
hobbies
  Field required [type=missing, input_value={'Name': 'Ryan', 'Age': 3... 'playing video games']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

What happened? It is valid json, so what is the problem? The issue here is the title of the required fields don't match up! Name in the returned output is name in the Pydantic object. One way to fix this is to also directly feed the schema into the prompt.

A schema is essentially a description of the data that we want to pass in. We can use the json_schema method to get the schema of the Pydantic object.

A handy feature of Pydantic objects is that you can serialise the class description into a JSON schema - essentially just a string or dictionary representation of the object.

In [49]:

Copied!

schema = Person.model_json_schema()
print(type(schema))
pprint(schema)
schema = Person.model_json_schema()
print(type(schema))
pprint(schema)

<class 'dict'>

{
│   'properties': {
│   │   'name': {
│   │   │   'anyOf': [{'type': 'string'}, {'type': 'null'}],
│   │   │   'description': 'The name of the person',
│   │   │   'title': 'Name'
│   │   },
│   │   'age': {
│   │   │   'anyOf': [{'type': 'integer'}, {'type': 'null'}],
│   │   │   'description': 'The age of the person',
│   │   │   'title': 'Age'
│   │   },
│   │   'nationality': {
│   │   │   'anyOf': [{'type': 'string'}, {'type': 'null'}],
│   │   │   'description': 'The nationality of the person',
│   │   │   'title': 'Nationality'
│   │   },
│   │   'occupation': {
│   │   │   'anyOf': [{'type': 'string'}, {'type': 'null'}],
│   │   │   'description': 'The occupation of the person',
│   │   │   'title': 'Occupation'
│   │   },
│   │   'pets': {
│   │   │   'anyOf': [{'items': {'type': 'string'}, 'type': 'array'}, {'type': 'null'}],
│   │   │   'description': 'The pets of the person',
│   │   │   'title': 'Pets'
│   │   },
│   │   'hobbies': {
│   │   │   'anyOf': [{'items': {'type': 'string'}, 'type': 'array'}, {'type': 'null'}],
│   │   │   'description': 'The hobbies of the person',
│   │   │   'title': 'Hobbies'
│   │   }
│   },
│   'required': ['name', 'age', 'nationality', 'occupation', 'pets', 'hobbies'],
│   'title': 'Person',
│   'type': 'object'
}

Now we should include this schema into the prompt

In [50]:

Copied!





system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n\n"
    f"Return the information in JSON format according to the following schema:\n\n{schema}\n\n"
    "Here is a description:\n\n"
)

print(system_prompt)
system_prompt = (
    "Your main role is to analyse a piece of unstructured text and extract the following information:\n"
    "- Name\n"
    "- Age\n"
    "- Nationality\n"
    "- Occupation\n"
    "- A list of any pets\n"
    "- A list of any hobbies\n\n"
    "If any acronyms are used, please expand them.\n\n"
    f"Return the information in JSON format according to the following schema:\n\n{schema}\n\n"
    "Here is a description:\n\n"
)

print(system_prompt)

Your main role is to analyse a piece of unstructured text and extract the following information:
- Name
- Age
- Nationality
- Occupation
- A list of any pets
- A list of any hobbies

If any acronyms are used, please expand them.

Return the information in JSON format according to the following schema:

{'properties': {'name': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': 'The name of the person', 'title': 'Name'}, 'age': {'anyOf': [{'type': 'integer'}, {'type': 'null'}], 'description': 'The age of the person', 'title': 'Age'}, 'nationality': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': 'The nationality of the person', 'title': 'Nationality'}, 'occupation': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': 'The occupation of the person', 'title': 'Occupation'}, 'pets': {'anyOf': [{'items': {'type': 'string'}, 'type': 'array'}, {'type': 'null'}], 'description': 'The pets of the person', 'title': 'Pets'}, 'hobbies': {'anyOf': [{'items': {'type': 'string'}, 'type': 'array'}, {'type': 'null'}], 'description': 'The hobbies of the person', 'title': 'Hobbies'}}, 'required': ['name', 'age', 'nationality', 'occupation', 'pets', 'hobbies'], 'title': 'Person', 'type': 'object'}

Here is a description:

In [51]:

Copied!





response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  response_format={"type": "json_object"},
  temperature=0.0
)

print(response.choices[0].message.content)
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": description},
  ],
  max_tokens=512,
  response_format={"type": "json_object"},
  temperature=0.0
)

print(response.choices[0].message.content)

{
  "name": "Ryan",
  "age": 35,
  "nationality": "New Zealand",
  "occupation": "Machine Learning Engineer",
  "pets": ["cat"],
  "hobbies": ["hiking", "playing video games"]
}

Good, this is valid

In [52]:

Copied!

json_content = json.loads(response.choices[0].message.content)
person = Person(**json_content)

pprint(person, expand_all=True)
json_content = json.loads(response.choices[0].message.content)
person = Person(**json_content)

pprint(person, expand_all=True)

Person(
│   name='Ryan',
│   age=35,
│   nationality='New Zealand',
│   occupation='Machine Learning Engineer',
│   pets=[
│   │   'cat'
│   ],
│   hobbies=[
│   │   'hiking',
│   │   'playing video games'
│   ]
)