2023年8月11日 01:18:20go评论207阅读模式

英文:

Indexing structures numpy arrays with nested flexible types (np.void type)

问题

我想使用嵌套的类型访问结构化的NumPy数组。例如，

a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[('blah',[('beep','f'),('boop','f')]),('num','i')])

我可以这样访问 beep 和 boop，

>>> a['blah'][['beep','boop']] 
array([(1., 2.), (2., 1.)], dtype=[('row', '<f4'), ('col', '<f4')])

但是是否有一种方法可以在不先单独索引 'blah' 的情况下完成这个操作？例如，我想这样做

>>> a[['blah',['beep','boop']]]

但这会返回错误信息，

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

总的来说，我想要能够为任何可能的类型嵌套情况像这样索引结构化数组，而不需要硬编码任何关于嵌套深度的知识（甚至不需要知道是否有嵌套）。

英文:

I would like to index into a structured numpy array with a nested typing. For example,

a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[(&#39;blah&#39;,[(&#39;beep&#39;,&#39;f&#39;),(&#39;boop&#39;,&#39;f&#39;)]),(&#39;num&#39;,&#39;i&#39;)])

I can access 'beep' and 'boop' like this,

&gt;&gt;&gt; a[&#39;blah&#39;][[&#39;beep&#39;,&#39;boop&#39;]] 
array([(1., 2.), (2., 1.)], dtype=[(&#39;row&#39;, &#39;&lt;f4&#39;), (&#39;col&#39;, &#39;&lt;f4&#39;)])

but is there some way I can do this without first indexing 'blah' separately? For example, I would like to do

&gt;&gt;&gt; a[[&#39;blah&#39;,[&#39;beep&#39;,&#39;boop&#39;]]]

but this returns,

&lt;stdin&gt;:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify &#39;dtype=object&#39; when creating the ndarray.
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In general, I want to be able to index into an structured array like this for any possible type nesting without having any hard-coded knowledge about how deep the nesting is (or even if it's nested at all).

答案1

得分: 2

我建议递归地探索dtype以获取感兴趣的列。

这个示例检查dtype的每个级别，如果找到一个名为beep或boop的列，则打印它。

def recursively_get_columns(array, path=()):
    names = array.dtype.names
    for name in names:
        col = array[name]
        is_simple = col.dtype.names is None
        new_path = path + (name,)
        if is_simple:
            yield new_path, array[name]
        else:
            yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
    if path[-1] in ['beep', 'boop']:
        print(path, col)

输出：

('blah', 'beep') [1. 2.]
('blah', 'boop') [2. 1.]

英文:

I would suggest recursively exploring the dtype to get the column of interest.

This example examines each level of the dtype, and prints it if it find a column named either beep or boop.

def recursively_get_columns(array, path=()):
    names = array.dtype.names
    for name in names:
        col = array[name]
        is_simple = col.dtype.names is None
        new_path = path + (name,)
        if is_simple:
            yield new_path, array[name]
        else:
            yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
    if path[-1] in [&#39;beep&#39;, &#39;boop&#39;]:
        print(path, col)

Outputs:

(&#39;blah&#39;, &#39;beep&#39;) [1. 2.]
(&#39;blah&#39;, &#39;boop&#39;) [2. 1.]

答案2

得分: 2

在以下代码中，recfunctions 是一个用于处理 NumPy 结构化数组的库。它提供了一些用于探索数据类型（dtype）、名称（names）以及在创建新结构化数组时复制数据的功能。这个库在较早的时候可能更常用，因为那时 recarray 更为常见。现在，我们通常使用结构化数组或 pandas。以下是代码中的翻译部分：

import numpy.lib.recfunctions as rf：导入 NumPy 结构化数组函数库 recfunctions。
a.dtype：获取数组 a 的数据类型（dtype）。
rf.flatten_descr(a.dtype)：使用 recfunctions 中的函数 flatten_descr 来展平数据类型的描述。
a.dtype.names：获取数组 a 的字段名称（names）。
rf.get_names(a.dtype)：使用 recfunctions 中的函数 get_names 来获取字段名称的嵌套结构。
rf.get_names_flat(a.dtype)：使用 recfunctions 中的函数 get_names_flat 来获取字段名称的扁平列表。

这些函数可以帮助你探索和操作结构化数组的数据类型和字段名称。

英文:

exploring recfunctions

https://numpy.org/devdocs/user/basics.rec.html#module-numpy.lib.recfunctions

In [104]: import numpy.lib.recfunctions as rf
In [105]: a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[(&#39;blah&#39;,[(&#39;beep&#39;,&#39;f&#39;),(&#39;boop&#39;,&#39;f&#39;)]),(&#39;num&#39;,&#39;i&#39;)])
In [106]: a.dtype
Out[106]: dtype([(&#39;blah&#39;, [(&#39;beep&#39;, &#39;&lt;f4&#39;), (&#39;boop&#39;, &#39;&lt;f4&#39;)]), (&#39;num&#39;, &#39;&lt;i4&#39;)])
In [107]: rf.flatten_descr(a.dtype)
Out[107]: 
((&#39;beep&#39;, dtype(&#39;float32&#39;)),
 (&#39;boop&#39;, dtype(&#39;float32&#39;)),
 (&#39;num&#39;, dtype(&#39;int32&#39;)))
In [108]: a.dtype.names
Out[108]: (&#39;blah&#39;, &#39;num&#39;)
In [110]: rf.get_names(a.dtype)
Out[110]: ((&#39;blah&#39;, (&#39;beep&#39;, &#39;boop&#39;)), &#39;num&#39;)
In [111]: rf.get_names_flat(a.dtype)
Out[111]: (&#39;blah&#39;, &#39;beep&#39;, &#39;boop&#39;, &#39;num&#39;)

This isn't a heavily used library, written I think back in the days when recarray was more common. Now we mostly use the structured array form, or pandas. Mostly these functions explore the dtype, its names, and when making something new, create a 'blank', and copy data by field name.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用嵌套的灵活类型（np.void类型）索引结构化的numpy数组。

问题

答案1

答案2

根据Pandas中一列的分类值，对具有相似前缀的多列进行分组，并进行求和。

我在使用pip安装neat时遇到了一个错误。

你可以在一个分组中的条形之间添加空白间隔吗？

单机调度 – 截止日期约束

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。