英文:
Create dataclass instance from union type based on string literal
问题
我试图为我们的代码库实现强类型。代码的一个重要部分是处理来自外部设备的事件并将它们转发给不同的处理程序。这些事件都有一个值属性,但这个值可以有不同的类型。这个值类型是根据事件名称进行映射的。因此,温度事件始终具有int值,而寄存器事件始终具有RegisterInfo
作为其值。
因此,我想将事件名称映射到值类型。但我们在实现上遇到了困难。
这个设置最接近我们想要的:
@dataclass
class EventBase:
name: str
value: Any
value_type: str
@dataclass
class RegisterEvent(EventBase):
value: RegisterInfo
name: Literal["register"]
value_type: Literal["RegisterInfo"] = "RegisterInfo"
@dataclass
class NumberEvent(EventBase):
value: float | int
name: Literal["temperature", "line_number"]
value_type: Literal["number"] = "number"
@dataclass
class StringEvent(EventBase):
value: str
name: Literal["warning", "status"]
value_type: Literal["string"] = "string"
Events: TypeAlias = RegisterEvent | NumberEvent | StringEvent
使用这个设置,mypy将标记不正确的代码,例如:
def handle_event(event: Events):
if event.name == "temperature":
event.value.upper()
(它认为温度事件应该具有int值,而int没有upper()
方法)
但是,用这种方式创建事件会变得很丑陋。我不想要一个大的if语句,将每个事件名称映射到特定的事件类。我们有许多不同类型的事件,而这些映射信息已经包含在这些类中。
理想情况下,我想让它看起来像这样:
def handle_device_message(message_info):
event_name = message_info["event_name"]
event_value = message_info["event_value"]
event = Events(event_name, event_value)
像这样的“一行代码”是否可能?
我感觉我们有点碰壁了,代码结构可能存在问题吗?
英文:
I'm trying to strongly type our code base. A big part of the code is handling events that come from external devices and forwarding them to different handlers. These events all have a value attribute, but this value can have different types. This value type is mapped per event name. So a temperature event always has an int value, an register event always as RegisterInfo
as its value.
So I would like to map the event name to the value type. But we are struggling with implementation.
This setup comes the closest to what we want:
@dataclass
class EventBase:
name: str
value: Any
value_type: str
@dataclass
class RegisterEvent(EventBase):
value: RegisterInfo
name: Literal["register"]
value_type: Literal["RegisterInfo"] = "RegisterInfo"
@dataclass
class NumberEvent(EventBase):
value: float | int
name: Literal["temperature", "line_number"]
value_type: Literal["number"] = "number"
@dataclass
class StringEvent(EventBase):
value: str
name: Literal["warning", "status"]
value_type: Literal["string"] = "string"
Events: TypeAlias = RegisterEvent | NumberEvent | StringEvent
With this setup mypy will flag incorrect code like:
def handle_event(event: Events):
if event.name == "temperature":
event.value.upper()
(It sees that a temperature event should have value type int, and that doesn't have an upper()
method)
But creating the events becomes ugly this way. I don't want a big if statement that maps each event name to a specific event class. We have lots of different event types, and this mapping info is already inside these classes.
Ideally I would like it to look like this:
def handle_device_message(message_info):
event_name = message_info["event_name"]
event_value = message_info["event_value"]
event = Events(event_name, event_value)
Is a "one-liner" like this possible?
I feel like we are kinda walking against wall here, could it be that the code is architecturally wrong?
答案1
得分: 3
UPDATE: 使用 Pydantic v2
如果您愿意切换到 Pydantic 而不是 dataclasses
,您可以通过 typing.Annotated
定义一个带有标签的联合,并使用 TypeAdapter
作为一个能够根据提供的 name
字符串区分不同 Event
子类型的“通用”构造函数。
这是我的建议:
from typing import Annotated, Any, Literal
from pydantic import BaseModel, Field, TypeAdapter
class EventBase(BaseModel):
name: str
value: Any
class NumberEvent(EventBase):
name: Literal["temperature", "line_number"]
value: float
class StringEvent(EventBase):
name: Literal["warning", "status"]
value: str
Event = TypeAdapter(Annotated[
NumberEvent | StringEvent,
Field(discriminator="name"),
])
event_temp = Event.validate_python({"name": "temperature", "value": 3.14})
event_status = Event.validate_python({"name": "status", "value": "spam"})
print(repr(event_temp)) # NumberEvent(name='temperature', value=3.14)
print(repr(event_status)) # StringEvent(name='status', value='spam')
当然,一个 无效 的 name
将会引起验证错误,就像一个完全错误的 value
类型一样(无法强制转换)。示例:
from pydantic import ValidationError
try:
Event.validate_python({"name": "temperature", "value": "foo"})
except ValidationError as err:
print(err.json(indent=4))
try:
Event.validate_python({"name": "foo", "value": "bar"})
except ValidationError as err:
print(err.json(indent=4))
输出:
[
{
"type": "float_parsing",
"loc": [
"temperature",
"value"
],
"msg": "Input should be a valid number, unable to parse string as a number",
"input": "foo",
"url": "https://errors.pydantic.dev/2.1/v/float_parsing"
}
]
[
{
"type": "union_tag_invalid",
"loc": [],
"msg": "Input tag 'foo' found using 'name' does not match any of the expected tags: 'temperature', 'line_number', 'warning', 'status'",
"input": {
"name": "foo",
"value": "bar"
},
"ctx": {
"discriminator": "'name'",
"tag": "foo",
"expected_tags": "'temperature', 'line_number', 'warning', 'status'"
},
"url": "https://errors.pydantic.dev/2.1/v/union_tag_invalid"
}
]
原始回答:使用 Pydantic v1
如果您愿意切换到 Pydantic 而不是 dataclasses
,您可以通过 typing.Annotated
定义一个带有标签的联合,并使用 parse_obj_as
函数作为一个能够根据提供的 name
字符串区分不同 Event
子类型的“通用”构造函数。
这是我的建议:
from typing import Annotated, Any, Literal
from pydantic import BaseModel, Field, parse_obj_as
class EventBase(BaseModel):
name: str
value: Any
class NumberEvent(EventBase):
name: Literal["temperature", "line_number"]
value: float
class StringEvent(EventBase):
name: Literal["warning", "status"]
value: str
Event = Annotated[
NumberEvent | StringEvent,
Field(discriminator="name"),
]
event_temp = parse_obj_as(Event, {"name": "temperature", "value": "3.14"})
event_status = parse_obj_as(Event, {"name": "status", "value": -10})
print(repr(event_temp)) # NumberEvent(name='temperature', value=3.14)
print(repr(event_status)) # StringEvent(name='status', value='-10')
在这个示例中,我故意使用了各自 value
字段的“错误”类型,以表明 Pydantic 将自动尝试将它们强制转换为 正确 类型,一旦它根据提供的 name
确定了正确的模型。
当然,一个 无效 的 name
将会引起验证错误,就像一个完全错误的 value
类型一样(无法强制转换)。示例:
from pydantic import ValidationError
try:
parse_obj_as(Event, {"name": "temperature", "value": "foo"})
except ValidationError as err:
print(err.json(indent=4))
try:
parse_obj_as(Event, {"name": "foo", "value": "bar"})
except ValidationError as err:
print(err.json(indent=4))
输出:
[
{
"loc": [
"__root__",
"NumberEvent",
"value"
],
"msg": "value is not a valid float",
"type": "type_error.float"
}
]
[
{
"loc": [
"__root__"
],
"msg": "No match for discriminator 'name' and value 'foo' (allowed values: 'temperature', 'line_number', 'warning', 'status')",
"type": "value_error.discriminated_union.invalid_discriminator",
"ctx": {
"discriminator_key": "name",
"discriminator_value": "foo",
"allowed_values": "'temperature', 'line_number', 'warning', 'status'"
}
}
]
附注
像 NumberEvent | StringEvent
这样的类型联合的别名仍然应该有一个单数的名称,即 Event
而不是 Events
,因为从语义上讲,注释 e: Event
表示 e
应该是 其中一种类型 的实例,而 e: Events
则暗示 e
将是 这些类型之一 的多个实例(集合)。
此外,联合类型 float | int
几乎总是等同于 float
,因为按照约定,所有类型检查器都将 int
视为 float
的子类型。
英文:
UPDATE: Using Pydantic v2
If you are willing to switch to Pydantic instead of dataclasses
, you can define a discriminated union via typing.Annotated
and use the TypeAdapter
as a "universal" constructor that is able to discriminate between distinct Event
subtypes based on the provided name
string.
Here is what I would suggest:
from typing import Annotated, Any, Literal
from pydantic import BaseModel, Field, TypeAdapter
class EventBase(BaseModel):
name: str
value: Any
class NumberEvent(EventBase):
name: Literal["temperature", "line_number"]
value: float
class StringEvent(EventBase):
name: Literal["warning", "status"]
value: str
Event = TypeAdapter(Annotated[
NumberEvent | StringEvent,
Field(discriminator="name"),
])
event_temp = Event.validate_python({"name": "temperature", "value": 3.14})
event_status = Event.validate_python({"name": "status", "value": "spam"})
print(repr(event_temp)) # NumberEvent(name='temperature', value=3.14)
print(repr(event_status)) # StringEvent(name='status', value='spam')
An invalid name
would of course cause a validation error, just like a completely wrong and type for value
(that cannot be coerced). Example:
from pydantic import ValidationError
try:
Event.validate_python({"name": "temperature", "value": "foo"})
except ValidationError as err:
print(err.json(indent=4))
try:
Event.validate_python({"name": "foo", "value": "bar"})
except ValidationError as err:
print(err.json(indent=4))
Output:
[
{
"type": "float_parsing",
"loc": [
"temperature",
"value"
],
"msg": "Input should be a valid number, unable to parse string as a number",
"input": "foo",
"url": "https://errors.pydantic.dev/2.1/v/float_parsing"
}
]
[
{
"type": "union_tag_invalid",
"loc": [],
"msg": "Input tag 'foo' found using 'name' does not match any of the expected tags: 'temperature', 'line_number', 'warning', 'status'",
"input": {
"name": "foo",
"value": "bar"
},
"ctx": {
"discriminator": "'name'",
"tag": "foo",
"expected_tags": "'temperature', 'line_number', 'warning', 'status'"
},
"url": "https://errors.pydantic.dev/2.1/v/union_tag_invalid"
}
]
Original Answer: Using Pydantic v1
If you are willing to switch to Pydantic instead of dataclasses
, you can define a discriminated union via typing.Annotated
and use the parse_obj_as
function as a "universal" constructor that is able to discriminate between distinct Event
subtypes based on the provided name
string.
Here is what I would suggest:
from typing import Annotated, Any, Literal
from pydantic import BaseModel, Field, parse_obj_as
class EventBase(BaseModel):
name: str
value: Any
class NumberEvent(EventBase):
name: Literal["temperature", "line_number"]
value: float
class StringEvent(EventBase):
name: Literal["warning", "status"]
value: str
Event = Annotated[
NumberEvent | StringEvent,
Field(discriminator="name"),
]
event_temp = parse_obj_as(Event, {"name": "temperature", "value": "3.14"})
event_status = parse_obj_as(Event, {"name": "status", "value": -10})
print(repr(event_temp)) # NumberEvent(name='temperature', value=3.14)
print(repr(event_status)) # StringEvent(name='status', value='-10')
In this usage demo I purposefully used the "wrong" types for the respective value
fields to show that Pydantic will automatically try to coerce them to the right types, once it determines the correct model based on the provided name
.
An invalid name
would of course cause a validation error, just like a completely wrong and type for value
(that cannot be coerced). Example:
from pydantic import ValidationError
try:
parse_obj_as(Event, {"name": "temperature", "value": "foo"})
except ValidationError as err:
print(err.json(indent=4))
try:
parse_obj_as(Event, {"name": "foo", "value": "bar"})
except ValidationError as err:
print(err.json(indent=4))
Output:
[
{
"loc": [
"__root__",
"NumberEvent",
"value"
],
"msg": "value is not a valid float",
"type": "type_error.float"
}
]
[
{
"loc": [
"__root__"
],
"msg": "No match for discriminator 'name' and value 'foo' (allowed values: 'temperature', 'line_number', 'warning', 'status')",
"type": "value_error.discriminated_union.invalid_discriminator",
"ctx": {
"discriminator_key": "name",
"discriminator_value": "foo",
"allowed_values": "'temperature', 'line_number', 'warning', 'status'"
}
}
]
Side notes
An alias for a union of types like NumberEvent | StringEvent
should still have a singular name, i.e. Event
rather than Events
because semantically the annotation e: Event
indicates e
should be an instance of one of those types, whereas e: Events
would suggest e
will be multiple instances (a collection) of either of those types.
Also the union float | int
is almost always equivalent to float
because int
is by convention considered a subtype of float
by all type checkers.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论