有没有一种方法可以将文件路径字段转换为原地解析的模型?

huangapple go评论70阅读模式
英文:

Is there a way to convert a file path field to a parsed model in-place?

问题

这是您要翻译的部分:

"如果我有两个模型,其中第二个模型有一个文件路径字段,引用一个文件,其内容由第一个模型描述。是否可能在原地扩展文件内容(将文件路径替换为解析后的模型)?

示例模型:

from pydantic import BaseModel, FilePath


class FirstModel(BaseModel):
    str_data: str
    num_list: list[int | float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FilePath

示例数据:

{
  "str_data": "Some string data up in here",
  "num_list": [1, 2, 3.14]
}

期望结果:

>>> SecondModel(some_other_field="Other field data", first_model="path/to/data.json")
SecondModel(some_other_field="Other field data", first_model=FirstModel(str_data="Some string data up in here", num_list=[1, 2, 3.14])

因此,最初我希望第一个模型字段表示为文件路径,然后进行解析并将字段设置为类型FirstModel。这可能吗?

我尝试过使用验证器、子类化第一个模型和自定义根类型的不同方法。"

英文:

If I have two models, the second of which has a file path field referencing a file, whose contents are described by the first model. Is it possible to expand the file contents in place (replace the file path with the parsed model)?

Sample models:

from pydantic import BaseModel, FilePath


class FirstModel(BaseModel):
    str_data: str
    num_list: list[int | float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FilePath

Sample data:

{
  "str_data": "Some string data up in here",
  "num_list": [1, 2, 3.14]
}

Desired result:

>>> SecondModel(some_other_field="Other field data", first_model="path/to/data.json")
SecondModel(some_other_field="Other field data", first_model=FirstModel(str_data="Some string data up in here", num_list=[1, 2, 3.14])

So initially I would like the first model field to be expressed as a file path, but then parsed and the field set to type FirstModel. Is this possible?

I've tried different approaches using validators, subclassing the first model, and custom root types.

答案1

得分: 0

首先,字段类型应该反映出在使用模型解析数据后您实际想要得到的内容。因此,first_model 的注释不应该是 FilePath,而应该是 FirstModel

然后,您仍然可以通过提供一个包含正确键值对的字典给 first_model 或者一个 FirstModel 的实际实例来 "正常" 初始化 SecondModel。但是,您也可以编写一个具有 pre=True 的自定义字段验证器来处理当有人提供文件路径而不是 "有效" 数据时的情况。

有几种方法可以实现这一点。我想到的最简单的方法是首先假设该值是有效的文件路径,可以打开和读取。如果成功,我们可以假设内容可以直接通过 FirstModel 解析。如果失败,我们只需返回原始值,让默认的验证器处理剩下的事情。

假设我们在当前工作目录中有一个名为 test.json 的文件,其中包含以下数据:

{
  "str_data": "foo",
  "num_list": [1, 2, 3.14]
}

以下是一个可行的实现:

from pathlib import Path

from pydantic import BaseModel, validator


class FirstModel(BaseModel):
    str_data: str
    num_list: list[float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FirstModel

    @validator("first_model", pre=True)
    def load_json_to_first_model(cls, v: object) -> object:
        try:
            contents = Path(str(v)).read_text()
        except (TypeError, OSError):
            return v
        return FirstModel.parse_raw(contents)


if __name__ == "__main__":
    obj = SecondModel.parse_obj({
        "some_other_field": "bar",
        "first_model": "test.json",
    })
    print(obj)

输出:

some_other_field='bar' first_model=FirstModel(str_data='foo', num_list=[1.0, 2.0, 3.14])

如果我们提供了无效的路径或文件无法打开,我们将会收到来自默认验证器的错误,告诉我们 first_model 不是一个有效的字典。如果需要的话,您可以在自定义验证器中进一步自定义此行为,例如通过区分如何处理 PermissionErrorFileNotFoundError 而不是捕获基本的 OSError

另外,float | int 的类型联合在Python中会归结为 float,尽管从技术上讲它们没有子类关系。这意味着您可以省略 int。所有的值都将被强制转换为 float。 (请参阅Pydantic文档中的相关信息。)

英文:

First of all, the field type should reflect what you actually want to end up with after you parse data with your model. So the annotation for first_model should not be FilePath, but FirstModel.

Then it is still possible to "normally" initialize SecondModel by providing either a dictionary with the correct key-value-pairs to first_model or an actual instance of FirstModel. But you can also write a custom field validator with pre=True that takes care of the case, when someone provides a file path instead of "valid" data.

There are a few ways to achieve this. The simplest approach that I can think of is to simply assume first that the value is valid file path that can be opened and read. If that succeeds, we can assume the contents can be directly parsed via FirstModel. If it fails, we just return the value unchanged and let the default validators take care of the rest.

Assume we have the following data in a file called test.json in our current working directory:

{
  "str_data": "foo",
  "num_list": [1, 2, 3.14]
}

Here is a working implementation:

from pathlib import Path

from pydantic import BaseModel, validator


class FirstModel(BaseModel):
    str_data: str
    num_list: list[float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FirstModel

    @validator("first_model", pre=True)
    def load_json_to_first_model(cls, v: object) -> object:
        try:
            contents = Path(str(v)).read_text()
        except (TypeError, OSError):
            return v
        return FirstModel.parse_raw(contents)


if __name__ == "__main__":
    obj = SecondModel.parse_obj({
        "some_other_field": "bar",
        "first_model": "test.json",
    })
    print(obj)

Output:

some_other_field='bar' first_model=FirstModel(str_data='foo', num_list=[1.0, 2.0, 3.14])

If we provide an invalid path or the file cannot be opened, the error we get will simply come from the default validator telling us that first_model is not a valid dictionary. You can customize this further in your custom validator if you want, for example by differentiating how you handle PermissionError and FileNotFoundError instead of catching the base OSError.

Side note, a type union of float | int reduces to float in Python even though there is technically no subclass relationship. This means you can omit the int. All values will be cast to float then. (See the Pydantic documentation on that matter.)

huangapple
  • 本文由 发表于 2023年1月9日 04:49:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051182.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定