使用嵌套的灵活类型(np.void类型)索引结构化的numpy数组。

huangapple go评论146阅读模式
英文:

Indexing structures numpy arrays with nested flexible types (np.void type)

问题

我想使用嵌套的类型访问结构化的NumPy数组。例如,

a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[('blah',[('beep','f'),('boop','f')]),('num','i')])

我可以这样访问 beepboop

>>> a['blah'][['beep','boop']] 
array([(1., 2.), (2., 1.)], dtype=[('row', '<f4'), ('col', '<f4')])

但是是否有一种方法可以在不先单独索引 'blah' 的情况下完成这个操作?例如,我想这样做

>>> a[['blah',['beep','boop']]]

但这会返回错误信息,

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

总的来说,我想要能够为任何可能的类型嵌套情况像这样索引结构化数组,而不需要硬编码任何关于嵌套深度的知识(甚至不需要知道是否有嵌套)。

英文:

I would like to index into a structured numpy array with a nested typing. For example,

a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[(&#39;blah&#39;,[(&#39;beep&#39;,&#39;f&#39;),(&#39;boop&#39;,&#39;f&#39;)]),(&#39;num&#39;,&#39;i&#39;)])

I can access &#39;beep&#39; and &#39;boop&#39; like this,

&gt;&gt;&gt; a[&#39;blah&#39;][[&#39;beep&#39;,&#39;boop&#39;]] 
array([(1., 2.), (2., 1.)], dtype=[(&#39;row&#39;, &#39;&lt;f4&#39;), (&#39;col&#39;, &#39;&lt;f4&#39;)])

but is there some way I can do this without first indexing 'blah' separately? For example, I would like to do

&gt;&gt;&gt; a[[&#39;blah&#39;,[&#39;beep&#39;,&#39;boop&#39;]]]

but this returns,

&lt;stdin&gt;:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify &#39;dtype=object&#39; when creating the ndarray.
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In general, I want to be able to index into an structured array like this for any possible type nesting without having any hard-coded knowledge about how deep the nesting is (or even if it's nested at all).

答案1

得分: 2

我建议递归地探索dtype以获取感兴趣的列。

这个示例检查dtype的每个级别,如果找到一个名为beep或boop的列,则打印它。

def recursively_get_columns(array, path=()):
    names = array.dtype.names
    for name in names:
        col = array[name]
        is_simple = col.dtype.names is None
        new_path = path + (name,)
        if is_simple:
            yield new_path, array[name]
        else:
            yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
    if path[-1] in ['beep', 'boop']:
        print(path, col)

输出:

('blah', 'beep') [1. 2.]
('blah', 'boop') [2. 1.]
英文:

I would suggest recursively exploring the dtype to get the column of interest.

This example examines each level of the dtype, and prints it if it find a column named either beep or boop.

def recursively_get_columns(array, path=()):
    names = array.dtype.names
    for name in names:
        col = array[name]
        is_simple = col.dtype.names is None
        new_path = path + (name,)
        if is_simple:
            yield new_path, array[name]
        else:
            yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
    if path[-1] in [&#39;beep&#39;, &#39;boop&#39;]:
        print(path, col)

Outputs:

(&#39;blah&#39;, &#39;beep&#39;) [1. 2.]
(&#39;blah&#39;, &#39;boop&#39;) [2. 1.]

答案2

得分: 2

在以下代码中,recfunctions 是一个用于处理 NumPy 结构化数组的库。它提供了一些用于探索数据类型(dtype)、名称(names)以及在创建新结构化数组时复制数据的功能。这个库在较早的时候可能更常用,因为那时 recarray 更为常见。现在,我们通常使用结构化数组或 pandas。以下是代码中的翻译部分:

  • import numpy.lib.recfunctions as rf:导入 NumPy 结构化数组函数库 recfunctions
  • a.dtype:获取数组 a 的数据类型(dtype)。
  • rf.flatten_descr(a.dtype):使用 recfunctions 中的函数 flatten_descr 来展平数据类型的描述。
  • a.dtype.names:获取数组 a 的字段名称(names)。
  • rf.get_names(a.dtype):使用 recfunctions 中的函数 get_names 来获取字段名称的嵌套结构。
  • rf.get_names_flat(a.dtype):使用 recfunctions 中的函数 get_names_flat 来获取字段名称的扁平列表。

这些函数可以帮助你探索和操作结构化数组的数据类型和字段名称。

英文:

exploring recfunctions

https://numpy.org/devdocs/user/basics.rec.html#module-numpy.lib.recfunctions

In [104]: import numpy.lib.recfunctions as rf

In [105]: a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[(&#39;blah&#39;,[(&#39;beep&#39;,&#39;f&#39;),(&#39;boop&#39;,&#39;f&#39;)]),(&#39;num&#39;,&#39;i&#39;)])

In [106]: a.dtype
Out[106]: dtype([(&#39;blah&#39;, [(&#39;beep&#39;, &#39;&lt;f4&#39;), (&#39;boop&#39;, &#39;&lt;f4&#39;)]), (&#39;num&#39;, &#39;&lt;i4&#39;)])

In [107]: rf.flatten_descr(a.dtype)
Out[107]: 
((&#39;beep&#39;, dtype(&#39;float32&#39;)),
 (&#39;boop&#39;, dtype(&#39;float32&#39;)),
 (&#39;num&#39;, dtype(&#39;int32&#39;)))

In [108]: a.dtype.names
Out[108]: (&#39;blah&#39;, &#39;num&#39;)


In [110]: rf.get_names(a.dtype)
Out[110]: ((&#39;blah&#39;, (&#39;beep&#39;, &#39;boop&#39;)), &#39;num&#39;)

In [111]: rf.get_names_flat(a.dtype)
Out[111]: (&#39;blah&#39;, &#39;beep&#39;, &#39;boop&#39;, &#39;num&#39;)

This isn't a heavily used library, written I think back in the days when recarray was more common. Now we mostly use the structured array form, or pandas. Mostly these functions explore the dtype, its names, and when making something new, create a 'blank', and copy data by field name.

huangapple
  • 本文由 发表于 2023年8月11日 01:18:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76877985.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定