如何将 Pydantic 模型实例导出为 YAML,其中 URL 类型为字符串。

huangapple go评论214阅读模式
英文:

How to export a Pydantic model instance as YAML with URL type as string

问题

我有一个Pydantic模型,其中包含一个AnyUrl类型的字段。
在将该模型导出为YAML时,AnyUrl会被序列化为单独的字段槽,而不是一个单一的字符串URL(可能是因为AnyUrl.__repr__方法的实现方式所致)。

例如:

from pydantic import BaseModel, AnyUrl
import yaml

class MyModel(BaseModel):
    url: AnyUrl

data = {'url': 'https://www.example.com'}
model = MyModel.parse_obj(data)

y = yaml.dump(model.dict(), indent=4)
print(y)

生成的YAML如下:

url: !!python/object/new:pydantic.networks.AnyUrl
    args:
    - https://www.example.com
    state: !!python/tuple
    - null
    -   fragment: null
        host: www.example.com
        host_type: domain
        password: null
        path: null
        port: null
        query: null
        scheme: https
        tld: com
        user: null

理想情况下,我希望序列化的YAML包含https://www.example.com,而不是单独的字段。

我已经尝试重写AnyUrl__repr__方法,以返回AnyUrl对象本身,因为它扩展了str类,但没有成功。

英文:

I have a Pydantic model with a field of type AnyUrl.
When exporting the model to YAML, the AnyUrl is serialized as individual field slots, instead of a single string URL (perhaps due to how the AnyUrl.__repr__ method is implemented).

For example:

from pydantic import BaseModel, AnyUrl
import yaml

class MyModel(BaseModel):
    url: AnyUrl


data = {'url': 'https://www.example.com'}
model = MyModel.parse_obj(data)

y = yaml.dump(model.dict(), indent=4)
print(y)

Produces:

url: !!python/object/new:pydantic.networks.AnyUrl
    args:
    - https://www.example.com
    state: !!python/tuple
    - null
    -   fragment: null
        host: www.example.com
        host_type: domain
        password: null
        path: null
        port: null
        query: null
        scheme: https
        tld: com
        user: null

Ideally, I would like the serialized YAML to contain https://www.example.com instead of individual fields.

I have tried to override the __repr__ method of AnyUrl to return the AnyUrl object itself, as it extends the str class, but no luck.

答案1

得分: 1

很抱歉,pyyaml文档实在是太糟糕了,所以似乎基本的自定义(反)序列化等事情都很难弄清楚。但基本上有两种方法可以解决这个问题。

选项A:子类化YAMLObject

你之前的想法是正确的,即子类化AnyUrl,但__repr__方法对于YAML序列化是无关紧要的。你需要执行以下三个步骤:

  1. 继承自YAMLObject
  2. 定义一个自定义的yaml_tag,以及
  3. 重写to_yaml类方法。

然后,pyyaml将根据你在to_yaml中定义的内容序列化这个自定义类(它同时继承自AnyUrlYAMLObject)。

to_yaml方法始终接收精确两个参数:

  1. 一个带有内置能力以序列化标准类型的yaml.Dumper实例(通过方法如represent_str等)以及
  2. 要序列化的实际数据。

为了避免添加/覆盖额外的方法,你可以利用AnyUrl继承自字符串,并且底层的str.__new__方法在构造过程中实际上接收到了完整的URL。因此,str.__str__方法将返回该URL "as is"。

from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load

class Url(AnyUrl, YAMLObject):
    yaml_tag = "!Url"

    @classmethod
    def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
        return dumper.represent_str(str.__str__(data))

class MyModel(BaseModel):
    foo: int = 0
    url: Url

obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)

serialized = dump(obj.dict()).strip()
print(serialized)

deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))

输出:

foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True

选项B:为AnyUrl注册一个representer函数

你可以避免定义自己的子类,而是全局地注册一个函数,该函数定义了AnyUrl实例应该如何序列化,使用yaml.add_representer函数。

该函数接受两个强制参数:

  1. 你想为其定义自定义序列化行为的类,以及
  2. 定义该序列化行为的_representer_函数。

_representer_函数本质上必须与选项A中介绍的YAMLObject.to_yaml类方法具有相同的签名,即它接受一个Dumper实例和要序列化的数据作为参数。

from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load

def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
    return dumper.represent_str(str.__str__(data))

add_representer(AnyUrl, url_representer)

class MyModel(BaseModel):
    foo: int = 0
    url: AnyUrl

obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)

serialized = dump(obj.dict()).strip()
print(serialized)

deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))

输出与选项A中的代码相同。

这种方法的好处是它涉及的代码较少,避免了选项A中两个父类之间的潜在命名空间冲突。

潜在的缺点是它修改了整个程序运行时的全局设置,如果你的应用程序变得更大,这可能会变得不够透明,如果你决定在某个时候以不同的方式序列化AnyUrl对象,这是需要注意的事项。

英文:

Unfortunately, the pyyaml documentation is just horrendous, so seemingly elemental things like customizing (de-)serialization are a pain to figure out properly. But there are essentially two ways you could solve this.

Option A: Subclass YAMLObject

You had the right right idea of subclassing AnyUrl, but the __repr__ method is irrelevant for YAML serialization. For that you need to do three things:

  1. Inherit from YAMLObject,
  2. define a custom yaml_tag, and
  3. override the to_yaml classmethod.

Then pyyaml will serialize this custom class (that inherits from both AnyUrl and YAMLObject) in accordance with what you define in to_yaml.

The to_yaml method always receives exactly two arguments:

  1. A yaml.Dumper instance with built-in capabilities to serialize standard types (via methods like represent_str for example) and
  2. the actual data to be serialized.

To avoid adding/overriding additional methods, you can leverage the fact that AnyUrl inherits from string and the underlying str.__new__ method actually receives the full URL during construction. Therefore the str.__str__ method will return that "as is".

from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load


class Url(AnyUrl, YAMLObject):
    yaml_tag = "!Url"

    @classmethod
    def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
        return dumper.represent_str(str.__str__(data))


class MyModel(BaseModel):
    foo: int = 0
    url: Url


obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)

serialized = dump(obj.dict()).strip()
print(serialized)

deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))

Output:

foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True

Option B: Register a representer function for AnyUrl

You can avoid defining your own subclass and instead globally register a function that defines how instances of AnyUrl should be serialized, by using the yaml.add_representer function.

That function takes two mandatory arguments:

  1. The class for which you want to define your custom serialization behavior and
  2. the representer function that defines that serialization behavior.

The representer function essentially has to have the same signature as the YAMLObject.to_yaml classmethod presented in option A, i.e. it takes a Dumper instance and the data to be serialized as arguments.

from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load


def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
    return dumper.represent_str(str.__str__(data))


add_representer(AnyUrl, url_representer)


class MyModel(BaseModel):
    foo: int = 0
    url: AnyUrl


obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)

serialized = dump(obj.dict()).strip()
print(serialized)

deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))

Output is the same as with the code from option A.

The benefit of this approach is that it involves less code and potential namespace collisions between the two parent classes in option A.

A potential drawback is that it modifies a global setting for the entire runtime of the program, which can become less transparent, if your application becomes large and is just something to be aware of, in case you decide you want to serialize AnyUrl objects differently at some point.

huangapple
  • 本文由 发表于 2023年6月2日 06:01:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定