I recently need a solution for serializing and deserializing Python. We initially
used pickle, but it was always a stop-gap, and it became an issue for debugging.
We switched to Pydantic,
which provided JSON serialization, but it left a problem with generic fields.
We had messages like this:
1 | from pydantic import BaseModel |
This worked for pickle, as it squirrelled away the python metadata, but it was
out of scope for pydantic which wanted to know all the candidate types in order to
provide a union.
Fortunately pydantic provides custom serialization.
The Serializer
Our serializer looks like this:
1 | from pydantic import BaseModel, ValidationInfo |
The serializer allows us to separate the stages of JSON serialization. First we
break the object into a JSON style python dictionary (using mode='json'
),
which ensures all of the dictionary values are serializable by JSON. Then we add
the metadata: the class’s module name and qualified name worked well. We can
then pass this on to pydantic to handle the rest.
The deserializer
In pydantic, deserialization is part of “validation”. The validation part
looked like this:
1 | import json |
There’s a bunch of code here. The key part is in deserialize_model_from_dict
where we can see the inverse of the serialization. It takes the module name and
class name from the supplied dictionary and creates the model class, from which it validates
the model.
The rest of the code provides the plumbing for pydantic. The entrypoint is thevalidate_model
function. This gets called via a number of different routes.
The first is when a python model is created, when the info.mode
will be'python'
and the value will be of type BaseModel
.
The second is when JSON data is provided when the info.mode
is 'json'
and
the type is some kind of text data.
The third is during serialization when a dictionary is generated by an
intermediate step. Here the info.mode
will be either 'python'
or 'json'
and the type of the value is a dict
.
The Model Field Attributes
This all gets wired up in the following manner:
1 | from typing import Annotated |
Note how we have to use typing annotations to pass in the serializer and validator.
The generated JSON looks as follows:
1 | >> model = Update( |
We keep the dunder names to avoid collisions and to flatter the pythonistas.
Whole Message Serialization
It turned out that it would be useful to know the meta data of the root message, as we wanted to save all the messages to a data store and replay them. This turned out to require just one extra function.
1 | import json |
Now we can create the metadata at the root level.
1 | update = Update(model=user) |
We can see there is metadata at the root level.
Finally we can do a full roundtrip with any base model:
1 | update = Update(model=user) |
Now we can save and retrieve any model. Happy days!
You can find the code for this blog here.