英文:
Declare JSON encoder on the class itself for Pydantic
问题
我有以下的类:
```python
class Thing:
def __init__(self, x: str):
self.x = x
def __str__(self):
return self.x
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v: str) -> "Thing":
return cls(v)
由于验证器方法,我可以将这个类用作Pydantic模型中的自定义字段类型:
from pydantic import BaseModel
from thing import Thing
class Model(BaseModel):
thing: Thing
但是如果我想要序列化为JSON,我需要在Pydantic模型上设置json_encoders
选项:
class Model(BaseModel):
class Config:
json_encoders = {
Thing: str
}
thing: Thing
现在Pydantic可以将Thing
序列化为JSON并反序列化。但配置存在于两个地方:部分在Model
上,部分在类Thing
上。我想将所有配置都设置在Thing
上。
有没有办法在Thing
上设置json_encoders
选项,以便Pydantic知道如何透明地处理它?
请注意,这里对Thing
进行了最小化处理:它有许多逻辑,我不只是试图声明一个自定义的str
类型。
<details>
<summary>英文:</summary>
I have the following class:
```python
class Thing:
def __init__(self, x: str):
self.x = x
def __str__(self):
return self.x
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v: str) -> "Thing":
return cls(v)
Due to the validator method I can use this class as custom field type in a Pydantic model:
from pydantic import BaseModel
from thing import Thing
class Model(BaseModel):
thing: Thing
But if I want to serialize to JSON I need to set the json_encoders
option on the Pydantic model:
class Model(BaseModel):
class Config:
json_encoders = {
Thing: str
}
thing: Thing
Now Pydantic can serialize Thing
s to JSON and back. But the config is in two places: Partly on the Model
and partly on the class Thing
. I'd like to set it all on Thing
.
Is there any way to set the json_encoders
option on Thing
so Pydantic knows how to handle it transparently?
Note that Thing
is minimized here: It has a lot of logic and I'm not just trying to declare a custom str
type.
答案1
得分: 3
这实际上是一个比我认为的Pydantic模型更深层次的问题。我发现了关于是否应该在Python中引入像__json__
或__serialize__
这样的方法的标准协议的持续讨论。
问题在于Pydantic受制于标准库的json
模块的相同限制,即自定义类型的编码/序列化逻辑与类本身分离。
无论是否引入引入这种协议的更广泛想法是否合理,我们可以稍微借鉴一下,定义一个自定义版本的json.dumps
,检查是否存在__serialize__
方法,然后将其用作序列化对象的default
函数。(有关default
参数的解释,请参阅json.dump
文档。)
然后,我们可以设置一个自定义的基本模型,将Config.json_dumps
选项设置为该函数。这样,所有子模型都会自动回退到该函数进行序列化(除非通过BaseModel.json
方法的encoder
参数进行覆盖,例如)。
以下是一个示例:
base.py
from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any
from pydantic import BaseModel as PydanticBaseModel
def json_dumps_extended(obj: object, **kwargs: Any) -> str:
default: Callable[[object], object] = kwargs.pop("default", lambda x: x)
def custom_default(to_encode: object) -> object:
serialize_method = getattr(to_encode, "__serialize__", None)
if serialize_method is None:
return default(to_encode)
return serialize_method()
return json_dumps(obj, default=custom_default, **kwargs)
class BaseModel(PydanticBaseModel):
class Config:
json_dumps = json_dumps_extended
application.py
from __future__ import annotations
from collections.abc import Callable, Iterator
from .base import BaseModel
class Thing:
def __init__(self, x: str) -> None:
self.x = x
def __str__(self) -> str:
return self.x
def __serialize__(self) -> str:
return self.x
@classmethod
def __get_validators__(cls) -> Iterator[Callable[..., Thing]]:
yield cls.validate
@classmethod
def validate(cls, v: str) -> Thing:
return cls(v)
class Model(BaseModel):
thing: Thing
num: float = 3.14
instance = Model(thing=Thing("foo"))
print(instance.json(indent=4))
输出:
{
"thing": "foo",
"num": 3.14
}
注意:对于Python <3.9
用户,请从typing
而不是collections.abc
导入Callable
和Iterator
类型。
PS
如果您希望能够在不仅仅是基本模型中重复使用这种序列化方法,可能需要在类型方面投入更多的精力。我们的__serialize__
方法可以使用runtime_checkable
自定义协议。
此外,我们可以通过使用functools.partial
使json_dumps_extended
方法更加精简一些。
以下是建议的base.py
的略微复杂版本:
from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable
from pydantic import BaseModel as PydanticBaseModel
T = TypeVar("T")
T_co = TypeVar("T_co", covariant=True)
Func1Arg = Callable[[object], T]
@runtime_checkable
class Serializable(Protocol[T_co]):
def __serialize__(self) -> T_co: ...
@overload
def serialize(obj: Serializable[T_co]) -> T_co: ...
@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -> T: ...
def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -> Any:
if isinstance(obj, Serializable):
return obj.__serialize__()
if fallback is None:
raise TypeError(f"Object not serializable: {obj}")
return fallback(obj)
def _id(x: T) -> T: return x
def json_dumps_extended(obj: object, **kwargs: Any) -> str:
custom_default = partial(serialize, fallback=kwargs.pop("default", _id))
return json_dumps(obj, default=custom_default, **kwargs)
class BaseModel(PydanticBaseModel):
class Config:
json_dumps = json_dumps_extended
另一种选择可能是直接对JSONEncoder.default
进行猴子补丁。但是,如果没有进一步的配置,Pydantic似乎仍然会执行自己的类型检查,从而防止在调用该方法之前进行序列化。
在引入某种标准序列化协议之前,我认为我们没有更好的选择。
英文:
This is actually an issue that goes much deeper than Pydantic models in my opinion. I found this ongoing discussion about whether a standard protocol with a method like __json__
or __serialize__
should be introduced in Python.
The problem is that Pydantic is confined by those same limitations of the standard library's json
module, in that encoding/serialization logic for custom types is separated from the class itself.
Whether or not the broader idea of introducing such a protocol makes sense, we can piggy-back off of it a little to define a customized version of json.dumps
that checks for the presence of e.g. a __serialize__
method and uses that as the default
function to serialize the object. (See the json.dump
documentation for an explanation of the default
parameter.)
Then we can set up a custom base model with the Config.json_dumps
option set to that function. That way all child models would automatically fall back to that for serialization (unless overridden by the encoder
argument to the BaseModel.json
method for example).
Here is an example:
base.py
from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any
from pydantic import BaseModel as PydanticBaseModel
def json_dumps_extended(obj: object, **kwargs: Any) -> str:
default: Callable[[object], object] = kwargs.pop("default", lambda x: x)
def custom_default(to_encode: object) -> object:
serialize_method = getattr(to_encode, "__serialize__", None)
if serialize_method is None:
return default(to_encode)
return serialize_method() # <-- already bound to `to_encode`
return json_dumps(obj, default=custom_default, **kwargs)
class BaseModel(PydanticBaseModel):
class Config:
json_dumps = json_dumps_extended
application.py
from __future__ import annotations
from collections.abc import Callable, Iterator
from .base import BaseModel
class Thing:
def __init__(self, x: str) -> None:
self.x = x
def __str__(self) -> str:
return self.x
def __serialize__(self) -> str: # <-- this is the magic method
return self.x
@classmethod
def __get_validators__(cls) -> Iterator[Callable[..., Thing]]:
yield cls.validate
@classmethod
def validate(cls, v: str) -> Thing:
return cls(v)
class Model(BaseModel):
thing: Thing
num: float = 3.14
instance = Model(thing=Thing("foo"))
print(instance.json(indent=4))
Output:
{
"thing": "foo",
"num": 3.14
}
Note for Python <3.9
users: Import the Callable
and Iterator
types from typing
instead of collections.abc
.
PS
If you want to be able to re-use this approach to serialization in more places than just the base model, it may be a good idea to put a bit more effort into the types. A runtime_checkable
custom protocol for our __serialize__
method may be useful.
Also we can make the json_dumps_extended
method a bit less clunky by using functools.partial
.
Here is a slightly more sophisticated version of the suggested base.py
:
from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable
from pydantic import BaseModel as PydanticBaseModel
T = TypeVar("T")
T_co = TypeVar("T_co", covariant=True)
Func1Arg = Callable[[object], T]
@runtime_checkable
class Serializable(Protocol[T_co]):
def __serialize__(self) -> T_co: ...
@overload
def serialize(obj: Serializable[T_co]) -> T_co: ...
@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -> T: ...
def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -> Any:
if isinstance(obj, Serializable):
return obj.__serialize__()
if fallback is None:
raise TypeError(f"Object not serializable: {obj}")
return fallback(obj)
def _id(x: T) -> T: return x
def json_dumps_extended(obj: object, **kwargs: Any) -> str:
custom_default = partial(serialize, fallback=kwargs.pop("default", _id))
return json_dumps(obj, default=custom_default, **kwargs)
class BaseModel(PydanticBaseModel):
class Config:
json_dumps = json_dumps_extended
Another alternative might have been to just monkey-patch JSONEncoder.default
directly. But without further configurations Pydantic seems to still perform the type checks itself and prevent serialization before that method is even called.
I don't think we have a better option, until some standard serialization protocol (at least for JSON) is introduced.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论