英文:
How to export a Pydantic model instance as YAML with URL type as string
问题
我有一个Pydantic模型,其中包含一个AnyUrl
类型的字段。
在将该模型导出为YAML时,AnyUrl
会被序列化为单独的字段槽,而不是一个单一的字符串URL(可能是因为AnyUrl.__repr__
方法的实现方式所致)。
例如:
from pydantic import BaseModel, AnyUrl
import yaml
class MyModel(BaseModel):
url: AnyUrl
data = {'url': 'https://www.example.com'}
model = MyModel.parse_obj(data)
y = yaml.dump(model.dict(), indent=4)
print(y)
生成的YAML如下:
url: !!python/object/new:pydantic.networks.AnyUrl
args:
- https://www.example.com
state: !!python/tuple
- null
- fragment: null
host: www.example.com
host_type: domain
password: null
path: null
port: null
query: null
scheme: https
tld: com
user: null
理想情况下,我希望序列化的YAML包含https://www.example.com
,而不是单独的字段。
我已经尝试重写AnyUrl
的__repr__
方法,以返回AnyUrl
对象本身,因为它扩展了str
类,但没有成功。
英文:
I have a Pydantic model with a field of type AnyUrl
.
When exporting the model to YAML, the AnyUrl
is serialized as individual field slots, instead of a single string URL (perhaps due to how the AnyUrl.__repr__
method is implemented).
For example:
from pydantic import BaseModel, AnyUrl
import yaml
class MyModel(BaseModel):
url: AnyUrl
data = {'url': 'https://www.example.com'}
model = MyModel.parse_obj(data)
y = yaml.dump(model.dict(), indent=4)
print(y)
Produces:
url: !!python/object/new:pydantic.networks.AnyUrl
args:
- https://www.example.com
state: !!python/tuple
- null
- fragment: null
host: www.example.com
host_type: domain
password: null
path: null
port: null
query: null
scheme: https
tld: com
user: null
Ideally, I would like the serialized YAML to contain https://www.example.com
instead of individual fields.
I have tried to override the __repr__
method of AnyUrl
to return the AnyUrl
object itself, as it extends the str
class, but no luck.
答案1
得分: 1
很抱歉,pyyaml
文档实在是太糟糕了,所以似乎基本的自定义(反)序列化等事情都很难弄清楚。但基本上有两种方法可以解决这个问题。
选项A:子类化YAMLObject
你之前的想法是正确的,即子类化AnyUrl
,但__repr__
方法对于YAML序列化是无关紧要的。你需要执行以下三个步骤:
- 继承自
YAMLObject
, - 定义一个自定义的
yaml_tag
,以及 - 重写
to_yaml
类方法。
然后,pyyaml
将根据你在to_yaml
中定义的内容序列化这个自定义类(它同时继承自AnyUrl
和YAMLObject
)。
to_yaml
方法始终接收精确两个参数:
- 一个带有内置能力以序列化标准类型的
yaml.Dumper
实例(通过方法如represent_str
等)以及 - 要序列化的实际数据。
为了避免添加/覆盖额外的方法,你可以利用AnyUrl
继承自字符串,并且底层的str.__new__
方法在构造过程中实际上接收到了完整的URL。因此,str.__str__
方法将返回该URL "as is"。
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load
class Url(AnyUrl, YAMLObject):
yaml_tag = "!Url"
@classmethod
def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
class MyModel(BaseModel):
foo: int = 0
url: Url
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))
输出:
foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True
选项B:为AnyUrl
注册一个representer函数
你可以避免定义自己的子类,而是全局地注册一个函数,该函数定义了AnyUrl
实例应该如何序列化,使用yaml.add_representer
函数。
该函数接受两个强制参数:
- 你想为其定义自定义序列化行为的类,以及
- 定义该序列化行为的_representer_函数。
_representer_函数本质上必须与选项A中介绍的YAMLObject.to_yaml
类方法具有相同的签名,即它接受一个Dumper
实例和要序列化的数据作为参数。
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load
def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
add_representer(AnyUrl, url_representer)
class MyModel(BaseModel):
foo: int = 0
url: AnyUrl
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))
输出与选项A中的代码相同。
这种方法的好处是它涉及的代码较少,避免了选项A中两个父类之间的潜在命名空间冲突。
潜在的缺点是它修改了整个程序运行时的全局设置,如果你的应用程序变得更大,这可能会变得不够透明,如果你决定在某个时候以不同的方式序列化AnyUrl
对象,这是需要注意的事项。
英文:
Unfortunately, the pyyaml
documentation is just horrendous, so seemingly elemental things like customizing (de-)serialization are a pain to figure out properly. But there are essentially two ways you could solve this.
Option A: Subclass YAMLObject
You had the right right idea of subclassing AnyUrl
, but the __repr__
method is irrelevant for YAML serialization. For that you need to do three things:
- Inherit from
YAMLObject
, - define a custom
yaml_tag
, and - override the
to_yaml
classmethod.
Then pyyaml
will serialize this custom class (that inherits from both AnyUrl
and YAMLObject
) in accordance with what you define in to_yaml
.
The to_yaml
method always receives exactly two arguments:
- A
yaml.Dumper
instance with built-in capabilities to serialize standard types (via methods likerepresent_str
for example) and - the actual data to be serialized.
To avoid adding/overriding additional methods, you can leverage the fact that AnyUrl
inherits from string and the underlying str.__new__
method actually receives the full URL during construction. Therefore the str.__str__
method will return that "as is".
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load
class Url(AnyUrl, YAMLObject):
yaml_tag = "!Url"
@classmethod
def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
class MyModel(BaseModel):
foo: int = 0
url: Url
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))
Output:
foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True
Option B: Register a representer function for AnyUrl
You can avoid defining your own subclass and instead globally register a function that defines how instances of AnyUrl
should be serialized, by using the yaml.add_representer
function.
That function takes two mandatory arguments:
- The class for which you want to define your custom serialization behavior and
- the representer function that defines that serialization behavior.
The representer function essentially has to have the same signature as the YAMLObject.to_yaml
classmethod presented in option A, i.e. it takes a Dumper
instance and the data to be serialized as arguments.
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load
def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
add_representer(AnyUrl, url_representer)
class MyModel(BaseModel):
foo: int = 0
url: AnyUrl
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))
Output is the same as with the code from option A.
The benefit of this approach is that it involves less code and potential namespace collisions between the two parent classes in option A.
A potential drawback is that it modifies a global setting for the entire runtime of the program, which can become less transparent, if your application becomes large and is just something to be aware of, in case you decide you want to serialize AnyUrl
objects differently at some point.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论