Type hinting numpy arrays and batches

huangapple go评论79阅读模式
英文:

Type hinting numpy arrays and batches

问题

我正在尝试为一个科学Python项目创建一些数组类型。到目前为止,我已经创建了用于1D、2D和ND numpy数组的通用类型:

from typing import Any, Generic, Protocol, Tuple, TypeVar

import numpy as np
from numpy.typing import _DType, _GenericAlias

Vector = _GenericAlias(np.ndarray, (Tuple[int], _DType))
Matrix = _GenericAlias(np.ndarray, (Tuple[int, int], _DType))
Tensor = _GenericAlias(np.ndarray, (Tuple[int, ...], _DType))

第一个问题是,mypy报告VectorMatrixTensor不是有效的类型(例如,当我尝试myvar: Vector[int] = np.array([1, 2, 3])时)。

第二个问题是,我想创建一个通用类型Batch,我想像这样使用它:Batch[Vector[complex]] 应该类似于 Matrix[complex]Batch[Matrix[float]] 应该类似于 Tensor[float]Batch[Tensor[int] 应该类似于 Tensor[int]。我不确定我所说的“类似于”是什么意思,我想我的意思是mypy不应该报错。

我该如何解决这个问题?

英文:

I'm trying to create a few array types for a scientific python project. So far, I have created generic types for 1D, 2D and ND numpy arrays:

from typing import Any, Generic, Protocol, Tuple, TypeVar

import numpy as np
from numpy.typing import _DType, _GenericAlias

Vector = _GenericAlias(np.ndarray, (Tuple[int], _DType))
Matrix = _GenericAlias(np.ndarray, (Tuple[int, int], _DType))
Tensor = _GenericAlias(np.ndarray, (Tuple[int, ...], _DType))

The first issue is that mypy says that Vector, Matrix and Tensor are not valid types (e.g. when I try myvar: Vector[int] = np.array([1, 2, 3]))

The second issue is that I'd like to create a generic type Batch that I'd like to use like so: Batch[Vector[complex]] should be like Matrix[complex], Batch[Matrix[float]] should be like Tensor[float] and Batch[Tensor[int] should be like Tensor[int]. I am not sure what I mean by "should be like" I guess I mean that mypy should not complain.

How to I get about this?

答案1

得分: 2

不应该从外部使用受保护的成员(以下划线开头的名称)。它们通常以这种方式标记,以指示可能在将来发生更改的实现细节,这正是在numpy的不同版本之间发生的情况。例如,在1.24中,您的导入行from numpy.typing在运行时失败,因为您尝试导入的成员不再存在。

不需要使用内部别名构造函数,因为numpy.ndarray 在数组的形状(shape)和数据类型(dtype)方面已经是通用的。您可以相当容易地构建自己的类型别名。只需确保正确地参数化 dtype。这是一个可工作的示例:

from typing import Tuple, TypeVar

import numpy as np

T = TypeVar("T", bound=np.generic, covariant=True)
Vector = np.ndarray[Tuple[int], np.dtype[T]]
Matrix = np.ndarray[Tuple[int, int], np.dtype[T]]
Tensor = np.ndarray[Tuple[int, ...], np.dtype[T]]

用法:

def f(v: Vector[np.complex64]) -> None:
    print(v[0])

def g(m: Matrix[np.float_]) -> None:
    print(m[0])

def h(t: Tensor[np.int32]) -> None:
    print(t.reshape((1, 4)))

f(np.array([0j+1]))  # 输出 (1+0j)
g(np.array([[3.14, 0.], [1., -1.]]))  # 输出 [3.14 0.  ]
h(np.array([[3.14, 0.], [1., -1.]]))  # 输出 [[ 3.14  0.    1.   -1.  ]]

当前的问题是形状几乎没有类型支持,但正在进行工作以使用由PEP 646提供的新的TypeVarTuple功能来实现这一点。在那之前,几乎没有实际使用按形状来区分类型的实际用途。

批处理问题应该是一个单独的问题。请尝试一个问题一个问题地提问。

英文:

You should not be using protected members (names starting with an underscore) from the outside. They are typically marked this way to indicated implementation details that may change in the future, which is exactly what happened here between versions of numpy. For example in 1.24 your import line from numpy.typing fails at runtime because the members you try to import are no longer there.


There is no need to use internal alias constructors because numpy.ndarray is already generic in terms of the array shape and its dtype. You can construct your own type aliases fairly easily. You just need to ensure you parameterize the dtype correctly. Here is a working example:

from typing import Tuple, TypeVar

import numpy as np


T = TypeVar("T", bound=np.generic, covariant=True)

Vector = np.ndarray[Tuple[int], np.dtype[T]]
Matrix = np.ndarray[Tuple[int, int], np.dtype[T]]
Tensor = np.ndarray[Tuple[int, ...], np.dtype[T]]

Usage:

def f(v: Vector[np.complex64]) -> None:
    print(v[0])


def g(m: Matrix[np.float_]) -> None:
    print(m[0])


def h(t: Tensor[np.int32]) -> None:
    print(t.reshape((1, 4)))


f(np.array([0j+1]))  # prints (1+0j)
g(np.array([[3.14, 0.], [1., -1.]]))  # prints [3.14 0.  ]
h(np.array([[3.14, 0.], [1., -1.]]))  # prints [[ 3.14  0.    1.   -1.  ]]

The issue currently is that shapes have almost no typing support, but work is underway to implement that using the new TypeVarTuple capabilities provided by PEP 646. Until then, there is little practical use in discriminating the types by shape.


The batch issue should be a separate question. Try and ask one question at a time.

huangapple
  • 本文由 发表于 2023年2月19日 01:42:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定