英文:
Indexing structures numpy arrays with nested flexible types (np.void type)
问题
我想使用嵌套的类型访问结构化的NumPy数组。例如,
a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[('blah',[('beep','f'),('boop','f')]),('num','i')])
我可以这样访问 beep
和 boop
,
>>> a['blah'][['beep','boop']]
array([(1., 2.), (2., 1.)], dtype=[('row', '<f4'), ('col', '<f4')])
但是是否有一种方法可以在不先单独索引 'blah' 的情况下完成这个操作?例如,我想这样做
>>> a[['blah',['beep','boop']]]
但这会返回错误信息,
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
总的来说,我想要能够为任何可能的类型嵌套情况像这样索引结构化数组,而不需要硬编码任何关于嵌套深度的知识(甚至不需要知道是否有嵌套)。
英文:
I would like to index into a structured numpy array with a nested typing. For example,
a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[('blah',[('beep','f'),('boop','f')]),('num','i')])
I can access 'beep'
and 'boop'
like this,
>>> a['blah'][['beep','boop']]
array([(1., 2.), (2., 1.)], dtype=[('row', '<f4'), ('col', '<f4')])
but is there some way I can do this without first indexing 'blah' separately? For example, I would like to do
>>> a[['blah',['beep','boop']]]
but this returns,
<stdin>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
In general, I want to be able to index into an structured array like this for any possible type nesting without having any hard-coded knowledge about how deep the nesting is (or even if it's nested at all).
答案1
得分: 2
我建议递归地探索dtype以获取感兴趣的列。
这个示例检查dtype的每个级别,如果找到一个名为beep或boop的列,则打印它。
def recursively_get_columns(array, path=()):
names = array.dtype.names
for name in names:
col = array[name]
is_simple = col.dtype.names is None
new_path = path + (name,)
if is_simple:
yield new_path, array[name]
else:
yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
if path[-1] in ['beep', 'boop']:
print(path, col)
输出:
('blah', 'beep') [1. 2.]
('blah', 'boop') [2. 1.]
英文:
I would suggest recursively exploring the dtype to get the column of interest.
This example examines each level of the dtype, and prints it if it find a column named either beep or boop.
def recursively_get_columns(array, path=()):
names = array.dtype.names
for name in names:
col = array[name]
is_simple = col.dtype.names is None
new_path = path + (name,)
if is_simple:
yield new_path, array[name]
else:
yield from recursively_get_columns(col, new_path)
for path, col in recursively_get_columns(a):
if path[-1] in ['beep', 'boop']:
print(path, col)
Outputs:
('blah', 'beep') [1. 2.]
('blah', 'boop') [2. 1.]
答案2
得分: 2
在以下代码中,recfunctions
是一个用于处理 NumPy 结构化数组的库。它提供了一些用于探索数据类型(dtype)、名称(names)以及在创建新结构化数组时复制数据的功能。这个库在较早的时候可能更常用,因为那时 recarray
更为常见。现在,我们通常使用结构化数组或 pandas。以下是代码中的翻译部分:
import numpy.lib.recfunctions as rf
:导入 NumPy 结构化数组函数库recfunctions
。a.dtype
:获取数组a
的数据类型(dtype)。rf.flatten_descr(a.dtype)
:使用recfunctions
中的函数flatten_descr
来展平数据类型的描述。a.dtype.names
:获取数组a
的字段名称(names)。rf.get_names(a.dtype)
:使用recfunctions
中的函数get_names
来获取字段名称的嵌套结构。rf.get_names_flat(a.dtype)
:使用recfunctions
中的函数get_names_flat
来获取字段名称的扁平列表。
这些函数可以帮助你探索和操作结构化数组的数据类型和字段名称。
英文:
exploring recfunctions
https://numpy.org/devdocs/user/basics.rec.html#module-numpy.lib.recfunctions
In [104]: import numpy.lib.recfunctions as rf
In [105]: a = np.array([((1.,2.),1),((2.,1.),2)], dtype=[('blah',[('beep','f'),('boop','f')]),('num','i')])
In [106]: a.dtype
Out[106]: dtype([('blah', [('beep', '<f4'), ('boop', '<f4')]), ('num', '<i4')])
In [107]: rf.flatten_descr(a.dtype)
Out[107]:
(('beep', dtype('float32')),
('boop', dtype('float32')),
('num', dtype('int32')))
In [108]: a.dtype.names
Out[108]: ('blah', 'num')
In [110]: rf.get_names(a.dtype)
Out[110]: (('blah', ('beep', 'boop')), 'num')
In [111]: rf.get_names_flat(a.dtype)
Out[111]: ('blah', 'beep', 'boop', 'num')
This isn't a heavily used library, written I think back in the days when recarray
was more common. Now we mostly use the structured array
form, or pandas. Mostly these functions explore the dtype
, its names
, and when making something new, create a 'blank', and copy data by field name.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论