2023年6月19日 17:44:55go评论88阅读模式

英文:

Declare JSON encoder on the class itself for Pydantic

问题

我有以下的类：

```python
class Thing:
    def __init__(self, x: str):
        self.x = x

    def __str__(self):
        return self.x

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -&gt; &quot;Thing&quot;:
        return cls(v)

由于验证器方法，我可以将这个类用作Pydantic模型中的自定义字段类型：

from pydantic import BaseModel
from thing import Thing

class Model(BaseModel):
    thing: Thing

但是如果我想要序列化为JSON，我需要在Pydantic模型上设置json_encoders选项：

class Model(BaseModel):
    class Config:
        json_encoders = {
             Thing: str
        }
    thing: Thing

现在Pydantic可以将Thing序列化为JSON并反序列化。但配置存在于两个地方：部分在Model上，部分在类Thing上。我想将所有配置都设置在Thing上。

有没有办法在Thing上设置json_encoders选项，以便Pydantic知道如何透明地处理它？

请注意，这里对Thing进行了最小化处理：它有许多逻辑，我不只是试图声明一个自定义的str类型。


<details>
<summary>英文:</summary>

I have the following class:

```python
class Thing:
    def __init__(self, x: str):
        self.x = x

    def __str__(self):
        return self.x

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -&gt; &quot;Thing&quot;:
        return cls(v)

Due to the validator method I can use this class as custom field type in a Pydantic model:

from pydantic import BaseModel
from thing import Thing

class Model(BaseModel):
    thing: Thing

But if I want to serialize to JSON I need to set the json_encoders option on the Pydantic model:

class Model(BaseModel):
    class Config:
        json_encoders = {
             Thing: str
        }
    thing: Thing

Now Pydantic can serialize Things to JSON and back. But the config is in two places: Partly on the Model and partly on the class Thing. I'd like to set it all on Thing.

Is there any way to set the json_encoders option on Thing so Pydantic knows how to handle it transparently?

Note that Thing is minimized here: It has a lot of logic and I'm not just trying to declare a custom str type.

答案1

得分: 3

这实际上是一个比我认为的Pydantic模型更深层次的问题。我发现了关于是否应该在Python中引入像__json__或__serialize__这样的方法的标准协议的持续讨论。

问题在于Pydantic受制于标准库的json模块的相同限制，即自定义类型的编码/序列化逻辑与类本身分离。

无论是否引入引入这种协议的更广泛想法是否合理，我们可以稍微借鉴一下，定义一个自定义版本的json.dumps，检查是否存在__serialize__方法，然后将其用作序列化对象的default函数。（有关default参数的解释，请参阅json.dump文档。）

然后，我们可以设置一个自定义的基本模型，将Config.json_dumps选项设置为该函数。这样，所有子模型都会自动回退到该函数进行序列化（除非通过BaseModel.json方法的encoder参数进行覆盖，例如）。

以下是一个示例：

base.py

from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any

from pydantic import BaseModel as PydanticBaseModel

def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    default: Callable[[object], object] = kwargs.pop("default", lambda x: x)

    def custom_default(to_encode: object) -> object:
        serialize_method = getattr(to_encode, "__serialize__", None)
        if serialize_method is None:
            return default(to_encode)
        return serialize_method()  

    return json_dumps(obj, default=custom_default, **kwargs)

class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

application.py

from __future__ import annotations
from collections.abc import Callable, Iterator

from .base import BaseModel

class Thing:
    def __init__(self, x: str) -> None:
        self.x = x

    def __str__(self) -> str:
        return self.x

    def __serialize__(self) -> str:  
        return self.x

    @classmethod
    def __get_validators__(cls) -> Iterator[Callable[..., Thing]]:
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -> Thing:
        return cls(v)

class Model(BaseModel):
    thing: Thing
    num: float = 3.14

instance = Model(thing=Thing("foo"))
print(instance.json(indent=4))

输出：

{
    "thing": "foo",
    "num": 3.14
}

注意：对于Python <3.9用户，请从typing而不是collections.abc导入Callable和Iterator类型。

PS

如果您希望能够在不仅仅是基本模型中重复使用这种序列化方法，可能需要在类型方面投入更多的精力。我们的__serialize__方法可以使用runtime_checkable自定义协议。

此外，我们可以通过使用functools.partial使json_dumps_extended方法更加精简一些。

以下是建议的base.py的略微复杂版本：

from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable

from pydantic import BaseModel as PydanticBaseModel

T = TypeVar("T")
T_co = TypeVar("T_co", covariant=True)
Func1Arg = Callable[[object], T]

@runtime_checkable
class Serializable(Protocol[T_co]):
    def __serialize__(self) -> T_co: ...

@overload
def serialize(obj: Serializable[T_co]) -> T_co: ...

@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -> T: ...

def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -> Any:
    if isinstance(obj, Serializable):
        return obj.__serialize__()
    if fallback is None:
        raise TypeError(f"Object not serializable: {obj}")
    return fallback(obj)

def _id(x: T) -> T: return x

def json_dumps_extended(obj: object, **kwargs: Any) -> str:
    custom_default = partial(serialize, fallback=kwargs.pop("default", _id))
    return json_dumps(obj, default=custom_default, **kwargs)

class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

另一种选择可能是直接对JSONEncoder.default进行猴子补丁。但是，如果没有进一步的配置，Pydantic似乎仍然会执行自己的类型检查，从而防止在调用该方法之前进行序列化。

在引入某种标准序列化协议之前，我认为我们没有更好的选择。

英文:

This is actually an issue that goes much deeper than Pydantic models in my opinion. I found this ongoing discussion about whether a standard protocol with a method like __json__ or __serialize__ should be introduced in Python.

The problem is that Pydantic is confined by those same limitations of the standard library's json module, in that encoding/serialization logic for custom types is separated from the class itself.

Whether or not the broader idea of introducing such a protocol makes sense, we can piggy-back off of it a little to define a customized version of json.dumps that checks for the presence of e.g. a __serialize__ method and uses that as the default function to serialize the object. (See the json.dump documentation for an explanation of the default parameter.)

Then we can set up a custom base model with the Config.json_dumps option set to that function. That way all child models would automatically fall back to that for serialization (unless overridden by the encoder argument to the BaseModel.json method for example).

Here is an example:

base.py

from collections.abc import Callable
from json import dumps as json_dumps
from typing import Any

from pydantic import BaseModel as PydanticBaseModel


def json_dumps_extended(obj: object, **kwargs: Any) -&gt; str:
    default: Callable[[object], object] = kwargs.pop(&quot;default&quot;, lambda x: x)

    def custom_default(to_encode: object) -&gt; object:
        serialize_method = getattr(to_encode, &quot;__serialize__&quot;, None)
        if serialize_method is None:
            return default(to_encode)
        return serialize_method()  # &lt;-- already bound to `to_encode`

    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

application.py

from __future__ import annotations
from collections.abc import Callable, Iterator

from .base import BaseModel


class Thing:
    def __init__(self, x: str) -&gt; None:
        self.x = x

    def __str__(self) -&gt; str:
        return self.x

    def __serialize__(self) -&gt; str:  # &lt;-- this is the magic method
        return self.x

    @classmethod
    def __get_validators__(cls) -&gt; Iterator[Callable[..., Thing]]:
        yield cls.validate

    @classmethod
    def validate(cls, v: str) -&gt; Thing:
        return cls(v)


class Model(BaseModel):
    thing: Thing
    num: float = 3.14


instance = Model(thing=Thing(&quot;foo&quot;))
print(instance.json(indent=4))

Output:

{
    &quot;thing&quot;: &quot;foo&quot;,
    &quot;num&quot;: 3.14
}

Note for Python <3.9 users: Import the Callable and Iterator types from typing instead of collections.abc.

PS

If you want to be able to re-use this approach to serialization in more places than just the base model, it may be a good idea to put a bit more effort into the types. A runtime_checkable custom protocol for our __serialize__ method may be useful.

Also we can make the json_dumps_extended method a bit less clunky by using functools.partial.

Here is a slightly more sophisticated version of the suggested base.py:

from collections.abc import Callable
from functools import partial
from json import dumps as json_dumps
from typing import Any, Optional, Protocol, TypeVar, overload, runtime_checkable

from pydantic import BaseModel as PydanticBaseModel

T = TypeVar(&quot;T&quot;)
T_co = TypeVar(&quot;T_co&quot;, covariant=True)
Func1Arg = Callable[[object], T]


@runtime_checkable
class Serializable(Protocol[T_co]):
    def __serialize__(self) -&gt; T_co: ...


@overload
def serialize(obj: Serializable[T_co]) -&gt; T_co: ...


@overload
def serialize(obj: Any, fallback: Func1Arg[T]) -&gt; T: ...


def serialize(obj: Any, fallback: Optional[Func1Arg[Any]] = None) -&gt; Any:
    if isinstance(obj, Serializable):
        return obj.__serialize__()
    if fallback is None:
        raise TypeError(f&quot;Object not serializable: {obj}&quot;)
    return fallback(obj)


def _id(x: T) -&gt; T: return x


def json_dumps_extended(obj: object, **kwargs: Any) -&gt; str:
    custom_default = partial(serialize, fallback=kwargs.pop(&quot;default&quot;, _id))
    return json_dumps(obj, default=custom_default, **kwargs)


class BaseModel(PydanticBaseModel):
    class Config:
        json_dumps = json_dumps_extended

Another alternative might have been to just monkey-patch JSONEncoder.default directly. But without further configurations Pydantic seems to still perform the type checks itself and prevent serialization before that method is even called.

I don't think we have a better option, until some standard serialization protocol (at least for JSON) is introduced.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在类本身上为 Pydantic 声明 JSON 编码器。

问题

答案1

PS

PS

“FastAPI 和 JWT 令牌的令牌验证问题 – ‘无法验证凭据'”

TensorFlow文本分类示例为什么需要from_logits=True？

如何简化代码？

How can I convert this JSON string into a struct?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论