英文:
What is the proper type annotation for any object that can be unpacked with the ** operator?
问题
The proper type hint for any object that can be unpacked with the **
operator in Python is Mapping[str, Any]
.
英文:
I have a function that looks like this:
from pandas import Series
def my_function(unpackable: dict | Series) -> None:
{**unpackable}
I would actually like to type hint for any object that can be unpacked with the **
operator while excluding those that cannot. I thought about typing.Mapping
, but it appears isinstance(Series({"a": 1}), Mapping)
is False
.
What is the proper type hint for any object that can be unpacked with the **
operator in Python?
答案1
得分: 2
TL;DR
在大多数情况下,collections.abc.Mapping[K, V]
应该完全合适。
最广泛有效的注释是一个(通用的)协议,该协议实现了 __getitem__
和 keys
方法:
from collections.abc import Iterable, Hashable
from typing import Protocol, TypeVar
K = TypeVar("K", bound=Hashable)
V = TypeVar("V", covariant=True)
class DoubleStarUnpackable(Protocol[K, V]):
def keys(self) -> Iterable[K]: ...
def __getitem__(self, key: K, /) -> V: ...
Digging deeper
这个主题并不像你一开始以为的那么简单。要找出“可解包”对象的适当类型注释,我们需要深入研究一些不同的来源。
实际上,解包需要什么?
自从PEP 448之后,解包操作符 *
和 **
可以在多种不同情况下使用。对于 **
操作数类型的限制在官方表达式文档中明确提到:
双星号
**
表示字典解包。它的操作数必须是一个映射。
术语“映射”在进一步定义为:
支持任意键查找并实现了[...]
collections.abc.Mapping
[...] 中指定方法的容器对象。
要查看集合 ABC 的具体方法,我发现检查这个表格最有帮助。
但也许令人惊讶的是(至少对我来说是如此),并不是所有这些方法实际上都是必需的,以使解包在运行时正常工作。1 通过进行一些实验,我们可以看到哪些方法是必需的。事实证明,你只需要一个 __getitem__
和一个 keys
的实现。
最小示例:
class Foo:
def __getitem__(self, item: str) -> int:
if item != "a":
raise KeyError
return 1
def keys(self) -> str:
return "a"
演示:
def f(a: int) -> None:
print(f"{a=}")
f(**Foo())
print({**Foo()})
输出:
a=1
{'a': 1}
你会注意到,这也可以在不产生错误的情况下通过 mypy --strict
。
但是,一旦你删除这两个方法中的任何一个,无论是从 mypy
还是在运行时,都会出现错误。
那么我们如何注释它?
事实证明,python/typeshed
的开发者也意识到了这一点(毫不奇怪),并定义了一个专门用于此目的的协议,称之为 SupportsKeysAndGetItem
:
from collections.abc import Iterable
from typing import Protocol, TypeVar
_KT = TypeVar("_KT")
_VT_co = TypeVar("_VT_co", covariant=True)
class SupportsKeysAndGetItem(Protocol[_KT, _VT_co]):
def keys(self) -> Iterable[_KT]: ...
def __getitem__(self, __key: _KT) -> _VT_co: ...
显然,这对于我们愚蠢的 Foo
类非常有效,我们可以使用它来注释你的函数,就像这样:
def my_function(unpackable: SupportsKeysAndGetItem[str, int]) -> None:
unpacked = {**unpackable}
print(f"{unpacked=}")
my_function(Foo()) # unpacked={'a': 1}
同样,mypy
在没有错误的情况下接受此代码。
实际上,我们可以看到 mypy
使用来自 typeshed 的确切协议来检查是否可以“解包”。如果我们从 Foo
中省略 keys
或 __getitem__
方法并尝试执行 {**Foo()}
,则 mypy
的错误消息将告诉我们:
List item 0 has incompatible type "Foo"; expected "SupportsKeysAndGetItem[<nothing>, <nothing>]"
(不确定列表与此有何关系,但这里相关的部分是它告诉我们它期望某些实现了 SupportsKeysAndGetItem
协议的东西。)
为什么不直接使用 Mapping
?
当然,你可以这样做,在大多数情况下,这是完全可以的,也是我注释希望在某些时候进行解包的东西的方式。但是仍然有两个相关的注意事项需要记住。
Mapping
不是协议!
与其他集合抽象基类(如 Iterable
、Container
或 Reversible
)不同,collections.abc.Mapping
类实际上不是一个协议。那些是协议的类都在这里列出,PEP 544以及mypy
文档中的这里。
这意味着结构子类型化将不起作用。
即使我编写了一个实现了所有 Mapping
方法(即 __getitem__
、__iter__
、__len__
,以及继承的 __contains__
、keys
、items
、values
、get
、__eq__
和 __ne__
)的类 Foo
,类型检查器仍然会抱怨,如果我尝试执行 m: Mapping = Foo()
。
只有名义子类型化(即从 Mapping
继承)才能使这个工作。这里有一个关于这个确切主题的另一个Stack Overflow问题。
pandas.Series
是可解包的,但不是 Mapping
的子类型
由于你在问题中提到了 Series
类,我在这
英文:
TL;DR
In most cases collections.abc.Mapping[K, V]
will be just fine.
The widest possible valid annotation is a (generic) protocol implementing the __getitem__
and keys
methods:
from collections.abc import Iterable, Hashable
from typing import Protocol, TypeVar
K = TypeVar("K", bound=Hashable)
V = TypeVar("V", covariant=True)
class DoubleStarUnpackable(Protocol[K, V]):
def keys(self) -> Iterable[K]: ...
def __getitem__(self, key: K, /) -> V: ...
Digging deeper
This topic is not as straightforward as you might think at first glance. To find out what an appropriate type annotation for "unpackable" objects is, we need to dig into a few different sources.
What do you actually need for unpacking?
Since PEP 448 the unpacking operators *
and **
can be used in multiple different circumstances. The restriction on the **
operand type is explicitly mentioned in the official Expressions documentation:
> A double asterisk **
denotes dictionary unpacking. Its operand must be a mapping.
The term mapping is further defined as a
> container object that supports arbitrary key lookups and implements the methods specified in [...] collections.abc.Mapping
[...].
To see what the specific methods of a collections ABC are, I find it most helpful to check this table.
But what may be surprising (at least it was for me), is that not all of those are actually necessary for unpacking to work at runtime.<sup>1</sup> By doing a bit of experimentation, we can see which methods are necessary. It turns out that all you need is a __getitem__
and a keys
implementation.
Minimal example:
class Foo:
def __getitem__(self, item: str) -> int:
if item != "a":
raise KeyError
return 1
def keys(self) -> str:
return "a"
Demo:
def f(a: int) -> None:
print(f"{a=}")
f(**Foo())
print({**Foo()})
Output:
a=1
{'a': 1}
You will notice that this also passes mypy --strict
without errors.
But as soon as you remove either of those two methods, you will get an error both from mypy
and at runtime.
So how do we annotate it?
It turns out that the good people at python/typeshed
are also aware of this (no surprise here) and have defined a protocol for just that and called it SupportsKeysAndGetItem
:
from collections.abc import Iterable
from typing import Protocol, TypeVar
_KT = TypeVar("_KT")
_VT_co = TypeVar("_VT_co", covariant=True)
class SupportsKeysAndGetItem(Protocol[_KT, _VT_co]):
def keys(self) -> Iterable[_KT]: ...
def __getitem__(self, __key: _KT) -> _VT_co: ...
This obviously works just fine with our silly Foo
class and we can use it to annotate your function like this:
def my_function(unpackable: SupportsKeysAndGetItem[str, int]) -> None:
unpacked = {**unpackable}
print(f"{unpacked=}")
my_function(Foo()) # unpacked={'a': 1}
Again, mypy
accepts this code without errors.
We can actually see that mypy
uses this exact protocol from the typeshed to check, if something is "unpackable" or not. If we omit either the keys
or the __getitem__
method from Foo
and try to do {**Foo()}
, the error message from mypy
will tell us:
List item 0 has incompatible type "Foo"; expected "SupportsKeysAndGetItem[<nothing>, <nothing>]"
(Not sure what lists have to do with this, but the relevant bit here is that it tells us it expects something that implements the SupportsKeysAndGetItem
protocol.)
Why not just use Mapping
?
You could do that of course and in most situations that is just fine and is exactly how I would annotate something that is supposed to be unpacked at some point. But there are still two relevant caveats to keep in mind.
Mapping
is not a protocol!
Unlike other collections abstract base classes such as Iterable
, Container
or Reversible
, the collections.abc.Mapping
class ist not actually a protocol. The classes that are protocols are all listed here in PEP 544 as well as here in the mypy
documentation.
The consequence is that structural subtyping will not work.
Even if I wrote a class Foo
that implements all the Mapping
methods (i.e. __getitem__
, __iter__
, __len__
, as well as the inherited __contains__
, keys
, items
, values
, get
, __eq__
, and __ne__
), a type checker will still complain, if I tried to do m: Mapping = Foo()
.
Only nominal subtyping (i.e. inheriting from Mapping
) would make this work. Here is another Stack Overflow question about this exact topic.
pandas.Series
is unpackable, but not a Mapping
subtype
Since you brought up the Series
class in your question, I am using it here as a stand-in for any class that you could use for unpacking.
Annotating your function like this would not allow a Series
argument to be passed:
from collections.abc import Mapping
def my_function(unpackable: Mapping) -> None: ...
For example mypy
would complain, if you did my_function(pd.Series())
:
Argument 1 to "my_function" has incompatible type "Series[Any]"; expected "Mapping[Any, Any]"
So you would have to resort to specifically defining a union for the annotation, like Anton Petrov suggested in his answer.
But then what if someone would like to pass something that is neither a Mapping
subclass, nor a pd.Series
, but still unpackable?
This is basically the argument for making your function parameter type annotations as wide as possible.
Footnotes
<sup>1</sup> At least in the current CPython implementation. I could not find specific documentation for this.
答案2
得分: 1
The only requirement for the unpackable type is to follow the Mapping
protocol, but it doesn't mean it should be inherited from Mapping
.
So Mapping
should be enough in most cases, but if you want to be more descriptive, and considering you care about Series
specifically, you could create Unpackable
alias:
Unpackable = Union[Mapping, Series]
But, actually, mypy
will be happy if you provide Series
for the Mapping
type, so it's just a matter of what seems more readable to you.
英文:
The only requirement for the unpackable type is to follow the Mapping
protocol, but it doesn't mean it should be inherited from Mapping
.
So Mapping
should be enough in most cases, but if you want to be more descriptive, and considering you care about Series
specifically, you could create Unpackable
alias:
Unpackable = Union[Mapping, Series]
<s>But, actually, mypy
will be happy if you provide Series
for the Mapping
type, so it's just a matter of what seems more readable to you.</s>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论