在类本身上为 Pydantic 声明 JSON 编码器。

huangapple go评论83阅读模式
英文:

Declare JSON encoder on the class itself for Pydantic

问题

我有以下的类

```python
class Thing:
    def __init__(self, x: str):
        self.x = x

    def __str__(self):
        return self.x

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -> "Thing":
        return cls(v)

由于验证器方法,我可以将这个类用作Pydantic模型中的自定义字段类型:

from pydantic import BaseModel
from thing import Thing

class Model(BaseModel):
    thing: Thing

但是如果我想要序列化为JSON,我需要在Pydantic模型上设置json_encoders选项:

class Model(BaseModel):
    class Config:
        json_encoders = {
             Thing: str
        }
    thing: Thing

现在Pydantic可以将Thing序列化为JSON并反序列化。但配置存在于两个地方:部分在Model上,部分在类Thing上。我想将所有配置都设置在Thing上。

有没有办法在Thing上设置json_encoders选项,以便Pydantic知道如何透明地处理它?

请注意,这里对Thing进行了最小化处理:它有许多逻辑,我不只是试图声明一个自定义的str类型。


<details>
<summary>英文:</summary>

I have the following class:

```python
class Thing:
    def __init__(self, x: str):
        self.x = x

    def __str__(self):
        return self.x

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -&gt; &quot;Thing&quot;:
        return cls(v)

Due to the validator method I can use this class as custom field type in a Pydantic model:

from pydantic import BaseModel
from thing import Thing

class Model(BaseModel):
    thing: Thing

But if I want to serialize to JSON I need to set the json_encoders option on the Pydantic model:

class Model(BaseModel):
    class Config:
        json_encoders = {
             Thing: str
        }
    thing: Thing

Now Pydantic can serialize Things to JSON and back. But the config is in two places: Partly on the Model and partly on the class Thing. I'd like to set it all on Thing.

Is there any way to set the json_encoders option on Thing so Pydantic knows how to handle it transparently?

Note that Thing is minimized here: It has a lot of logic and I'm not just trying to declare a custom str type.

答案1

得分: 3

这实际上是一个比我认为的Pydantic模型更深层次的问题。我发现了关于是否应该在Python中引入像__json____serialize__这样的方法的标准协议的持续讨论

问题在于Pydantic受制于标准库的json模块的相同限制,即自定义类型的编码/序列化逻辑与类本身分离。

无论是否引入引入这种协议的更广泛想法是否合理,我们可以稍微借鉴一下,定义一个自定义版本的json.dumps,检查是否存在__serialize__方法,然后将其用作序列化对象的default函数。(有关default参数的解释,请参阅json.dump文档。)

然后,我们可以设置一个自定义的基本模型,将Config.json_dumps选项设置为该函数。这样,所有子模型都会自动回退到该函数进行序列化(除非通过BaseModel.json方法的encoder参数进行覆盖,例如)。

以下是一个示例:

base.py

from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any

from pydantic import BaseModel as PydanticBaseModel

def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    default: Callable[[object], object] = kwargs.pop("default", lambda x: x)

    def custom_default(to_encode: object) -> object:
        serialize_method = getattr(to_encode, "__serialize__", None)
        if serialize_method is None:
            return default(to_encode)
        return serialize_method()  

    return json_dumps(obj, default=custom_default, **kwargs)

class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

application.py

from __future__ import annotations
from collections.abc import Callable, Iterator

from .base import BaseModel

class Thing:
    def __init__(self, x: str) -> None:
        self.x = x

    def __str__(self) -> str:
        return self.x

    def __serialize__(self) -> str:  
        return self.x

    @classmethod
    def __get_validators__(cls) -> Iterator[Callable[..., Thing]]:
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -> Thing:
        return cls(v)

class Model(BaseModel):
    thing: Thing
    num: float = 3.14

instance = Model(thing=Thing("foo"))
print(instance.json(indent=4))

输出:

{
    "thing": "foo",
    "num": 3.14
}

注意:对于Python <3.9用户,请从typing而不是collections.abc导入CallableIterator类型。


PS

如果您希望能够在不仅仅是基本模型中重复使用这种序列化方法,可能需要在类型方面投入更多的精力。我们的__serialize__方法可以使用runtime_checkable自定义协议。

此外,我们可以通过使用functools.partial使json_dumps_extended方法更加精简一些。

以下是建议的base.py的略微复杂版本:

from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable

from pydantic import BaseModel as PydanticBaseModel

T = TypeVar("T")
T_co = TypeVar("T_co", covariant=True)
Func1Arg = Callable[[object], T]

@runtime_checkable
class Serializable(Protocol[T_co]):
    def __serialize__(self) -> T_co: ...

@overload
def serialize(obj: Serializable[T_co]) -> T_co: ...

@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -> T: ...

def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -> Any:
    if isinstance(obj, Serializable):
        return obj.__serialize__()
    if fallback is None:
        raise TypeError(f"Object not serializable: {obj}")
    return fallback(obj)

def _id(x: T) -> T: return x

def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    custom_default = partial(serialize, fallback=kwargs.pop("default", _id))
    return json_dumps(obj, default=custom_default, **kwargs)

class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

另一种选择可能是直接对JSONEncoder.default进行猴子补丁。但是,如果没有进一步的配置,Pydantic似乎仍然会执行自己的类型检查,从而防止在调用该方法之前进行序列化。

在引入某种标准序列化协议之前,我认为我们没有更好的选择。

英文:

This is actually an issue that goes much deeper than Pydantic models in my opinion. I found this ongoing discussion about whether a standard protocol with a method like __json__ or __serialize__ should be introduced in Python.

The problem is that Pydantic is confined by those same limitations of the standard library's json module, in that encoding/serialization logic for custom types is separated from the class itself.

Whether or not the broader idea of introducing such a protocol makes sense, we can piggy-back off of it a little to define a customized version of json.dumps that checks for the presence of e.g. a __serialize__ method and uses that as the default function to serialize the object. (See the json.dump documentation for an explanation of the default parameter.)

Then we can set up a custom base model with the Config.json_dumps option set to that function. That way all child models would automatically fall back to that for serialization (unless overridden by the encoder argument to the BaseModel.json method for example).

Here is an example:

base.py

from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any

from pydantic import BaseModel as PydanticBaseModel


def json_dumps_extended(obj: object, **kwargs: Any) -&gt; str:
    default: Callable[[object], object] = kwargs.pop(&quot;default&quot;, lambda x: x)

    def custom_default(to_encode: object) -&gt; object:
        serialize_method = getattr(to_encode, &quot;__serialize__&quot;, None)
        if serialize_method is None:
            return default(to_encode)
        return serialize_method()  # &lt;-- already bound to `to_encode`

    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

application.py

from __future__ import annotations
from collections.abc import Callable, Iterator

from .base import BaseModel


class Thing:
    def __init__(self, x: str) -&gt; None:
        self.x = x

    def __str__(self) -&gt; str:
        return self.x

    def __serialize__(self) -&gt; str:  # &lt;-- this is the magic method
        return self.x

    @classmethod
    def __get_validators__(cls) -&gt; Iterator[Callable[..., Thing]]:
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -&gt; Thing:
        return cls(v)


class Model(BaseModel):
    thing: Thing
    num: float = 3.14


instance = Model(thing=Thing(&quot;foo&quot;))
print(instance.json(indent=4))

Output:

{
    &quot;thing&quot;: &quot;foo&quot;,
    &quot;num&quot;: 3.14
}

Note for Python &lt;3.9 users: Import the Callable and Iterator types from typing instead of collections.abc.


PS

If you want to be able to re-use this approach to serialization in more places than just the base model, it may be a good idea to put a bit more effort into the types. A runtime_checkable custom protocol for our __serialize__ method may be useful.

Also we can make the json_dumps_extended method a bit less clunky by using functools.partial.

Here is a slightly more sophisticated version of the suggested base.py:

from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable

from pydantic import BaseModel as PydanticBaseModel

T = TypeVar(&quot;T&quot;)
T_co = TypeVar(&quot;T_co&quot;, covariant=True)
Func1Arg = Callable[[object], T]


@runtime_checkable
class Serializable(Protocol[T_co]):
    def __serialize__(self) -&gt; T_co: ...


@overload
def serialize(obj: Serializable[T_co]) -&gt; T_co: ...


@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -&gt; T: ...


def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -&gt; Any:
    if isinstance(obj, Serializable):
        return obj.__serialize__()
    if fallback is None:
        raise TypeError(f&quot;Object not serializable: {obj}&quot;)
    return fallback(obj)


def _id(x: T) -&gt; T: return x


def json_dumps_extended(obj: object, **kwargs: Any) -&gt; str:
    custom_default = partial(serialize, fallback=kwargs.pop(&quot;default&quot;, _id))
    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

Another alternative might have been to just monkey-patch JSONEncoder.default directly. But without further configurations Pydantic seems to still perform the type checks itself and prevent serialization before that method is even called.

I don't think we have a better option, until some standard serialization protocol (at least for JSON) is introduced.

huangapple
  • 本文由 发表于 2023年6月19日 17:44:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76505431.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定