从字符串文字基于联合类型创建数据类实例

huangapple go评论98阅读模式
英文:

Create dataclass instance from union type based on string literal

问题

我试图为我们的代码库实现强类型。代码的一个重要部分是处理来自外部设备的事件并将它们转发给不同的处理程序。这些事件都有一个值属性,但这个值可以有不同的类型。这个值类型是根据事件名称进行映射的。因此,温度事件始终具有int值,而寄存器事件始终具有RegisterInfo作为其值。

因此,我想将事件名称映射到值类型。但我们在实现上遇到了困难。

这个设置最接近我们想要的:

@dataclass
class EventBase:
    name: str
    value: Any
    value_type: str

@dataclass
class RegisterEvent(EventBase):
    value: RegisterInfo
    name: Literal["register"]
    value_type: Literal["RegisterInfo"] = "RegisterInfo"

@dataclass
class NumberEvent(EventBase):
    value: float | int
    name: Literal["temperature", "line_number"]
    value_type: Literal["number"] = "number"

@dataclass
class StringEvent(EventBase):
    value: str
    name: Literal["warning", "status"]
    value_type: Literal["string"] = "string"

Events: TypeAlias = RegisterEvent | NumberEvent | StringEvent

使用这个设置,mypy将标记不正确的代码,例如:

def handle_event(event: Events):
    if event.name == "temperature":
        event.value.upper()

(它认为温度事件应该具有int值,而int没有upper()方法)

但是,用这种方式创建事件会变得很丑陋。我不想要一个大的if语句,将每个事件名称映射到特定的事件类。我们有许多不同类型的事件,而这些映射信息已经包含在这些类中。

理想情况下,我想让它看起来像这样:

def handle_device_message(message_info):
    event_name = message_info["event_name"]
    event_value = message_info["event_value"]

    event = Events(event_name, event_value)

像这样的“一行代码”是否可能?

我感觉我们有点碰壁了,代码结构可能存在问题吗?

英文:

I'm trying to strongly type our code base. A big part of the code is handling events that come from external devices and forwarding them to different handlers. These events all have a value attribute, but this value can have different types. This value type is mapped per event name. So a temperature event always has an int value, an register event always as RegisterInfo as its value.

So I would like to map the event name to the value type. But we are struggling with implementation.

This setup comes the closest to what we want:

@dataclass
class EventBase:
    name: str
    value: Any
    value_type: str

@dataclass
class RegisterEvent(EventBase):
    value: RegisterInfo
    name: Literal["register"]
    value_type: Literal["RegisterInfo"] = "RegisterInfo"


@dataclass
class NumberEvent(EventBase):
    value: float | int
    name: Literal["temperature", "line_number"]
    value_type: Literal["number"] = "number"

@dataclass
class StringEvent(EventBase):
    value: str
    name: Literal["warning", "status"]
    value_type: Literal["string"] = "string"


Events: TypeAlias = RegisterEvent | NumberEvent | StringEvent

With this setup mypy will flag incorrect code like:

def handle_event(event: Events):
    if event.name == "temperature":
        event.value.upper()

(It sees that a temperature event should have value type int, and that doesn't have an upper() method)

But creating the events becomes ugly this way. I don't want a big if statement that maps each event name to a specific event class. We have lots of different event types, and this mapping info is already inside these classes.

Ideally I would like it to look like this:

def handle_device_message(message_info):
    event_name = message_info["event_name"]
    event_value = message_info["event_value"]

    event = Events(event_name, event_value)

Is a "one-liner" like this possible?

I feel like we are kinda walking against wall here, could it be that the code is architecturally wrong?

答案1

得分: 3

UPDATE: 使用 Pydantic v2

如果您愿意切换到 Pydantic 而不是 dataclasses,您可以通过 typing.Annotated 定义一个带有标签的联合,并使用 TypeAdapter 作为一个能够根据提供的 name 字符串区分不同 Event 子类型的“通用”构造函数。

这是我的建议:

from typing import Annotated, Any, Literal

from pydantic import BaseModel, Field, TypeAdapter


class EventBase(BaseModel):
    name: str
    value: Any


class NumberEvent(EventBase):
    name: Literal["temperature", "line_number"]
    value: float


class StringEvent(EventBase):
    name: Literal["warning", "status"]
    value: str


Event = TypeAdapter(Annotated[
    NumberEvent | StringEvent,
    Field(discriminator="name"),
])


event_temp = Event.validate_python({"name": "temperature", "value": 3.14})
event_status = Event.validate_python({"name": "status", "value": "spam"})

print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
print(repr(event_status))  # StringEvent(name='status', value='spam')

当然,一个 无效name 将会引起验证错误,就像一个完全错误的 value 类型一样(无法强制转换)。示例:

from pydantic import ValidationError

try:
    Event.validate_python({"name": "temperature", "value": "foo"})
except ValidationError as err:
    print(err.json(indent=4))

try:
    Event.validate_python({"name": "foo", "value": "bar"})
except ValidationError as err:
    print(err.json(indent=4))

输出:

[
    {
        "type": "float_parsing",
        "loc": [
            "temperature",
            "value"
        ],
        "msg": "Input should be a valid number, unable to parse string as a number",
        "input": "foo",
        "url": "https://errors.pydantic.dev/2.1/v/float_parsing"
    }
]
[
    {
        "type": "union_tag_invalid",
        "loc": [],
        "msg": "Input tag 'foo' found using 'name' does not match any of the expected tags: 'temperature', 'line_number', 'warning', 'status'",
        "input": {
            "name": "foo",
            "value": "bar"
        },
        "ctx": {
            "discriminator": "'name'",
            "tag": "foo",
            "expected_tags": "'temperature', 'line_number', 'warning', 'status'"
        },
        "url": "https://errors.pydantic.dev/2.1/v/union_tag_invalid"
    }
]

原始回答:使用 Pydantic v1

如果您愿意切换到 Pydantic 而不是 dataclasses,您可以通过 typing.Annotated 定义一个带有标签的联合,并使用 parse_obj_as 函数作为一个能够根据提供的 name 字符串区分不同 Event 子类型的“通用”构造函数。

这是我的建议:

from typing import Annotated, Any, Literal

from pydantic import BaseModel, Field, parse_obj_as


class EventBase(BaseModel):
    name: str
    value: Any


class NumberEvent(EventBase):
    name: Literal["temperature", "line_number"]
    value: float


class StringEvent(EventBase):
    name: Literal["warning", "status"]
    value: str


Event = Annotated[
    NumberEvent | StringEvent,
    Field(discriminator="name"),
]


event_temp = parse_obj_as(Event, {"name": "temperature", "value": "3.14"})
event_status = parse_obj_as(Event, {"name": "status", "value": -10})

print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
print(repr(event_status))  # StringEvent(name='status', value='-10')

在这个示例中,我故意使用了各自 value 字段的“错误”类型,以表明 Pydantic 将自动尝试将它们强制转换为 正确 类型,一旦它根据提供的 name 确定了正确的模型。

当然,一个 无效name 将会引起验证错误,就像一个完全错误的 value 类型一样(无法强制转换)。示例:

from pydantic import ValidationError

try:
    parse_obj_as(Event, {"name": "temperature", "value": "foo"})
except ValidationError as err:
    print(err.json(indent=4))

try:
    parse_obj_as(Event, {"name": "foo", "value": "bar"})
except ValidationError as err:
    print(err.json(indent=4))

输出:

[
    {
        "loc": [
            "__root__",
            "NumberEvent",
            "value"
        ],
        "msg": "value is not a valid float",
        "type": "type_error.float"
    }
]
[
    {
        "loc": [
            "__root__"
        ],
        "msg": "No match for discriminator 'name' and value 'foo' (allowed values: 'temperature', 'line_number', 'warning', 'status')",
        "type": "value_error.discriminated_union.invalid_discriminator",
        "ctx": {
            "discriminator_key": "name",
            "discriminator_value": "foo",
            "allowed_values": "'temperature', 'line_number', 'warning', 'status'"
        }
    }
]

附注

NumberEvent | StringEvent 这样的类型联合的别名仍然应该有一个单数的名称,即 Event 而不是 Events,因为从语义上讲,注释 e: Event 表示 e 应该是 其中一种类型 的实例,而 e: Events 则暗示 e 将是 这些类型之一 的多个实例(集合)。

此外,联合类型 float | int 几乎总是等同于 float,因为按照约定,所有类型检查器都将 int 视为 float子类型

英文:

UPDATE: Using Pydantic v2

If you are willing to switch to Pydantic instead of dataclasses, you can define a discriminated union via typing.Annotated and use the TypeAdapter as a "universal" constructor that is able to discriminate between distinct Event subtypes based on the provided name string.

Here is what I would suggest:

from typing import Annotated, Any, Literal

from pydantic import BaseModel, Field, TypeAdapter


class EventBase(BaseModel):
    name: str
    value: Any


class NumberEvent(EventBase):
    name: Literal["temperature", "line_number"]
    value: float


class StringEvent(EventBase):
    name: Literal["warning", "status"]
    value: str


Event = TypeAdapter(Annotated[
    NumberEvent | StringEvent,
    Field(discriminator="name"),
])


event_temp = Event.validate_python({"name": "temperature", "value": 3.14})
event_status = Event.validate_python({"name": "status", "value": "spam"})

print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
print(repr(event_status))  # StringEvent(name='status', value='spam')

An invalid name would of course cause a validation error, just like a completely wrong and type for value (that cannot be coerced). Example:

from pydantic import ValidationError

try:
    Event.validate_python({"name": "temperature", "value": "foo"})
except ValidationError as err:
    print(err.json(indent=4))

try:
    Event.validate_python({"name": "foo", "value": "bar"})
except ValidationError as err:
    print(err.json(indent=4))

Output:

[
    {
        "type": "float_parsing",
        "loc": [
            "temperature",
            "value"
        ],
        "msg": "Input should be a valid number, unable to parse string as a number",
        "input": "foo",
        "url": "https://errors.pydantic.dev/2.1/v/float_parsing"
    }
]
[
    {
        "type": "union_tag_invalid",
        "loc": [],
        "msg": "Input tag 'foo' found using 'name' does not match any of the expected tags: 'temperature', 'line_number', 'warning', 'status'",
        "input": {
            "name": "foo",
            "value": "bar"
        },
        "ctx": {
            "discriminator": "'name'",
            "tag": "foo",
            "expected_tags": "'temperature', 'line_number', 'warning', 'status'"
        },
        "url": "https://errors.pydantic.dev/2.1/v/union_tag_invalid"
    }
]

Original Answer: Using Pydantic v1

If you are willing to switch to Pydantic instead of dataclasses, you can define a discriminated union via typing.Annotated and use the parse_obj_as function as a "universal" constructor that is able to discriminate between distinct Event subtypes based on the provided name string.

Here is what I would suggest:

from typing import Annotated, Any, Literal

from pydantic import BaseModel, Field, parse_obj_as


class EventBase(BaseModel):
    name: str
    value: Any


class NumberEvent(EventBase):
    name: Literal["temperature", "line_number"]
    value: float


class StringEvent(EventBase):
    name: Literal["warning", "status"]
    value: str


Event = Annotated[
    NumberEvent | StringEvent,
    Field(discriminator="name"),
]


event_temp = parse_obj_as(Event, {"name": "temperature", "value": "3.14"})
event_status = parse_obj_as(Event, {"name": "status", "value": -10})

print(repr(event_temp))    # NumberEvent(name='temperature', value=3.14)
print(repr(event_status))  # StringEvent(name='status', value='-10')

In this usage demo I purposefully used the "wrong" types for the respective value fields to show that Pydantic will automatically try to coerce them to the right types, once it determines the correct model based on the provided name.

An invalid name would of course cause a validation error, just like a completely wrong and type for value (that cannot be coerced). Example:

from pydantic import ValidationError

try:
    parse_obj_as(Event, {"name": "temperature", "value": "foo"})
except ValidationError as err:
    print(err.json(indent=4))

try:
    parse_obj_as(Event, {"name": "foo", "value": "bar"})
except ValidationError as err:
    print(err.json(indent=4))

Output:

[
    {
        "loc": [
            "__root__",
            "NumberEvent",
            "value"
        ],
        "msg": "value is not a valid float",
        "type": "type_error.float"
    }
]
[
    {
        "loc": [
            "__root__"
        ],
        "msg": "No match for discriminator 'name' and value 'foo' (allowed values: 'temperature', 'line_number', 'warning', 'status')",
        "type": "value_error.discriminated_union.invalid_discriminator",
        "ctx": {
            "discriminator_key": "name",
            "discriminator_value": "foo",
            "allowed_values": "'temperature', 'line_number', 'warning', 'status'"
        }
    }
]

Side notes

An alias for a union of types like NumberEvent | StringEvent should still have a singular name, i.e. Event rather than Events because semantically the annotation e: Event indicates e should be an instance of one of those types, whereas e: Events would suggest e will be multiple instances (a collection) of either of those types.

Also the union float | int is almost always equivalent to float because int is by convention considered a subtype of float by all type checkers.

huangapple
  • 本文由 发表于 2023年6月1日 16:16:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定