适用于可以使用 ** 操作符进行解包的任何对象的正确类型注释是什么?

huangapple go评论76阅读模式
英文:

What is the proper type annotation for any object that can be unpacked with the ** operator?

问题

The proper type hint for any object that can be unpacked with the ** operator in Python is Mapping[str, Any].

英文:

I have a function that looks like this:

from pandas import Series
def my_function(unpackable: dict | Series) -> None:
    {**unpackable}

I would actually like to type hint for any object that can be unpacked with the ** operator while excluding those that cannot. I thought about typing.Mapping, but it appears isinstance(Series({"a": 1}), Mapping) is False.

What is the proper type hint for any object that can be unpacked with the ** operator in Python?

答案1

得分: 2

TL;DR

在大多数情况下,collections.abc.Mapping[K, V] 应该完全合适。

最广泛有效的注释是一个(通用的)协议,该协议实现了 __getitem__keys 方法:

from collections.abc import Iterable, Hashable
from typing import Protocol, TypeVar

K = TypeVar("K", bound=Hashable)
V = TypeVar("V", covariant=True)


class DoubleStarUnpackable(Protocol[K, V]):
    def keys(self) -> Iterable[K]: ...
    def __getitem__(self, key: K, /) -> V: ...

Digging deeper

这个主题并不像你一开始以为的那么简单。要找出“可解包”对象的适当类型注释,我们需要深入研究一些不同的来源。


实际上,解包需要什么?

自从PEP 448之后,解包操作符 *** 可以在多种不同情况下使用。对于 ** 操作数类型的限制在官方表达式文档中明确提到:

双星号 ** 表示字典解包。它的操作数必须是一个映射

术语“映射”在进一步定义为:

支持任意键查找并实现了[...] collections.abc.Mapping [...] 中指定方法的容器对象。

要查看集合 ABC 的具体方法,我发现检查这个表格最有帮助。

但也许令人惊讶的是(至少对我来说是如此),并不是所有这些方法实际上都是必需的,以使解包在运行时正常工作。1 通过进行一些实验,我们可以看到哪些方法是必需的。事实证明,你只需要一个 __getitem__ 和一个 keys 的实现。

最小示例:

class Foo:
    def __getitem__(self, item: str) -> int:
        if item != "a":
            raise KeyError
        return 1

    def keys(self) -> str:
        return "a"

演示:

def f(a: int) -> None:
    print(f"{a=}")


f(**Foo())

print({**Foo()})

输出:

a=1
{'a': 1}

你会注意到,这也可以在不产生错误的情况下通过 mypy --strict

但是,一旦你删除这两个方法中的任何一个,无论是从 mypy 还是在运行时,都会出现错误。


那么我们如何注释它?

事实证明,python/typeshed 的开发者也意识到了这一点(毫不奇怪),并定义了一个专门用于此目的的协议,称之为 SupportsKeysAndGetItem

from collections.abc import Iterable
from typing import Protocol, TypeVar

_KT = TypeVar("_KT")
_VT_co = TypeVar("_VT_co", covariant=True)


class SupportsKeysAndGetItem(Protocol[_KT, _VT_co]):
    def keys(self) -> Iterable[_KT]: ...
    def __getitem__(self, __key: _KT) -> _VT_co: ...

显然,这对于我们愚蠢的 Foo 类非常有效,我们可以使用它来注释你的函数,就像这样:

def my_function(unpackable: SupportsKeysAndGetItem[str, int]) -> None:
    unpacked = {**unpackable}
    print(f"{unpacked=}")


my_function(Foo())  # unpacked={'a': 1}

同样,mypy 在没有错误的情况下接受此代码。

实际上,我们可以看到 mypy 使用来自 typeshed 的确切协议来检查是否可以“解包”。如果我们从 Foo 中省略 keys__getitem__ 方法并尝试执行 {**Foo()},则 mypy 的错误消息将告诉我们:

List item 0 has incompatible type "Foo"; expected "SupportsKeysAndGetItem[<nothing>, <nothing>]"

(不确定列表与此有何关系,但这里相关的部分是它告诉我们它期望某些实现了 SupportsKeysAndGetItem 协议的东西。)


为什么不直接使用 Mapping

当然,你可以这样做,在大多数情况下,这是完全可以的,也是我注释希望在某些时候进行解包的东西的方式。但是仍然有两个相关的注意事项需要记住。

Mapping 不是协议!

与其他集合抽象基类(如 IterableContainerReversible)不同,collections.abc.Mapping 类实际上不是一个协议。那些是协议的类都在这里列出,PEP 544以及mypy文档中的这里

这意味着结构子类型化将不起作用。

即使我编写了一个实现了所有 Mapping 方法(即 __getitem____iter____len__,以及继承的 __contains__keysitemsvaluesget__eq____ne__)的类 Foo,类型检查器仍然会抱怨,如果我尝试执行 m: Mapping = Foo()

只有名义子类型化(即从 Mapping 继承)才能使这个工作。这里有一个关于这个确切主题的另一个Stack Overflow问题。

pandas.Series 是可解包的,但不是 Mapping 的子类型

由于你在问题中提到了 Series 类,我在这

英文:

TL;DR

In most cases collections.abc.Mapping[K, V] will be just fine.

The widest possible valid annotation is a (generic) protocol implementing the __getitem__ and keys methods:

from collections.abc import Iterable, Hashable
from typing import Protocol, TypeVar

K = TypeVar(&quot;K&quot;, bound=Hashable)
V = TypeVar(&quot;V&quot;, covariant=True)


class DoubleStarUnpackable(Protocol[K, V]):
    def keys(self) -&gt; Iterable[K]: ...
    def __getitem__(self, key: K, /) -&gt; V: ...

Digging deeper

This topic is not as straightforward as you might think at first glance. To find out what an appropriate type annotation for "unpackable" objects is, we need to dig into a few different sources.


What do you actually need for unpacking?

Since PEP 448 the unpacking operators * and ** can be used in multiple different circumstances. The restriction on the ** operand type is explicitly mentioned in the official Expressions documentation:

> A double asterisk ** denotes dictionary unpacking. Its operand must be a mapping.

The term mapping is further defined as a

> container object that supports arbitrary key lookups and implements the methods specified in [...] collections.abc.Mapping [...].

To see what the specific methods of a collections ABC are, I find it most helpful to check this table.

But what may be surprising (at least it was for me), is that not all of those are actually necessary for unpacking to work at runtime.<sup>1</sup> By doing a bit of experimentation, we can see which methods are necessary. It turns out that all you need is a __getitem__ and a keys implementation.

Minimal example:

class Foo:
    def __getitem__(self, item: str) -&gt; int:
        if item != &quot;a&quot;:
            raise KeyError
        return 1

    def keys(self) -&gt; str:
        return &quot;a&quot;

Demo:

def f(a: int) -&gt; None:
    print(f&quot;{a=}&quot;)


f(**Foo())

print({**Foo()})

Output:

a=1
{&#39;a&#39;: 1}

You will notice that this also passes mypy --strict without errors.

But as soon as you remove either of those two methods, you will get an error both from mypy and at runtime.


So how do we annotate it?

It turns out that the good people at python/typeshed are also aware of this (no surprise here) and have defined a protocol for just that and called it SupportsKeysAndGetItem:

from collections.abc import Iterable
from typing import Protocol, TypeVar

_KT = TypeVar(&quot;_KT&quot;)
_VT_co = TypeVar(&quot;_VT_co&quot;, covariant=True)


class SupportsKeysAndGetItem(Protocol[_KT, _VT_co]):
    def keys(self) -&gt; Iterable[_KT]: ...
    def __getitem__(self, __key: _KT) -&gt; _VT_co: ...

This obviously works just fine with our silly Foo class and we can use it to annotate your function like this:

def my_function(unpackable: SupportsKeysAndGetItem[str, int]) -&gt; None:
    unpacked = {**unpackable}
    print(f&quot;{unpacked=}&quot;)


my_function(Foo())  # unpacked={&#39;a&#39;: 1}

Again, mypy accepts this code without errors.

We can actually see that mypy uses this exact protocol from the typeshed to check, if something is "unpackable" or not. If we omit either the keys or the __getitem__ method from Foo and try to do {**Foo()}, the error message from mypy will tell us:

List item 0 has incompatible type &quot;Foo&quot;; expected &quot;SupportsKeysAndGetItem[&lt;nothing&gt;, &lt;nothing&gt;]&quot;

(Not sure what lists have to do with this, but the relevant bit here is that it tells us it expects something that implements the SupportsKeysAndGetItem protocol.)


Why not just use Mapping?

You could do that of course and in most situations that is just fine and is exactly how I would annotate something that is supposed to be unpacked at some point. But there are still two relevant caveats to keep in mind.

Mapping is not a protocol!

Unlike other collections abstract base classes such as Iterable, Container or Reversible, the collections.abc.Mapping class ist not actually a protocol. The classes that are protocols are all listed here in PEP 544 as well as here in the mypy documentation.

The consequence is that structural subtyping will not work.

Even if I wrote a class Foo that implements all the Mapping methods (i.e. __getitem__, __iter__, __len__, as well as the inherited __contains__, keys, items, values, get, __eq__, and __ne__), a type checker will still complain, if I tried to do m: Mapping = Foo().

Only nominal subtyping (i.e. inheriting from Mapping) would make this work. Here is another Stack Overflow question about this exact topic.

pandas.Series is unpackable, but not a Mapping subtype

Since you brought up the Series class in your question, I am using it here as a stand-in for any class that you could use for unpacking.

Annotating your function like this would not allow a Series argument to be passed:

from collections.abc import Mapping

def my_function(unpackable: Mapping) -&gt; None: ...

For example mypy would complain, if you did my_function(pd.Series()):

Argument 1 to &quot;my_function&quot; has incompatible type &quot;Series[Any]&quot;; expected &quot;Mapping[Any, Any]&quot;

So you would have to resort to specifically defining a union for the annotation, like Anton Petrov suggested in his answer.

But then what if someone would like to pass something that is neither a Mapping subclass, nor a pd.Series, but still unpackable?

This is basically the argument for making your function parameter type annotations as wide as possible.


Footnotes

<sup>1</sup> At least in the current CPython implementation. I could not find specific documentation for this.

答案2

得分: 1

The only requirement for the unpackable type is to follow the Mapping protocol, but it doesn't mean it should be inherited from Mapping.

So Mapping should be enough in most cases, but if you want to be more descriptive, and considering you care about Series specifically, you could create Unpackable alias:

Unpackable = Union[Mapping, Series]

But, actually, mypy will be happy if you provide Series for the Mapping type, so it's just a matter of what seems more readable to you.

英文:

The only requirement for the unpackable type is to follow the Mapping protocol, but it doesn't mean it should be inherited from Mapping.

So Mapping should be enough in most cases, but if you want to be more descriptive, and considering you care about Series specifically, you could create Unpackable alias:

Unpackable = Union[Mapping, Series]

<s>But, actually, mypy will be happy if you provide Series for the Mapping type, so it's just a matter of what seems more readable to you.</s>

huangapple
  • 本文由 发表于 2023年5月17日 06:42:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76267519.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定