2023年7月10日 14:36:48go评论76阅读模式

英文:

numpy with a list of Dict: Syntax to filter elements?

问题

我要筛选只返回isActive为True的元素，以下是使用NumPy的语法：

filtered_data = data[data['isActive'] == True]

请注意，你可以将True简化为True，因此上面的语句也可以写成：

filtered_data = data[data['isActive']]

这将返回一个包含满足条件的元素的NumPy数组。

英文:

Say I have a numpy list with each elements a Dict

data = [
    {
        &#39;Account&#39; : &#39;111&#39;,
        &#39;RIC&#39; : &#39;AAPL.OQ&#39;,
        &#39;Position&#39; : 100,
        &#39;isActive&#39; : True,
        &#39;Rating&#39; : math.nan
    },
    {
        &#39;Account&#39; : &#39;111&#39;,
        &#39;RIC&#39; : &#39;MSFT.OQ&#39;,
        &#39;Position&#39; : 200,
        &#39;isActive&#39; : False,
        &#39;Rating&#39; : 73
    },
    {
        &#39;Account&#39; : &#39;111&#39;,
        &#39;RIC&#39; : &#39;IBM.N&#39;,
        &#39;Position&#39; : 300,
        &#39;isActive&#39; : True,
        &#39;Rating&#39; : math.inf
    },

    {
        &#39;Account&#39; : &#39;222&#39;,
        &#39;RIC&#39; : &#39;AAPL.OQ&#39;,
        &#39;Position&#39; : 1000,
        &#39;isActive&#39; : False,
        &#39;Rating&#39; : 89
    },
    {
        &#39;Account&#39; : &#39;222&#39;,
        &#39;RIC&#39; : &#39;MSFT.OQ&#39;,
        &#39;Position&#39; : 2000,
        &#39;isActive&#39; : True,
        &#39;Rating&#39; : np.nan
    },
    {
        &#39;Account&#39; : &#39;222&#39;,
        &#39;RIC&#39; : &#39;IBM.N&#39;,
        &#39;Position&#39; : 3000,
        &#39;isActive&#39; : True,
        &#39;Rating&#39; : 59
    }
]
data = np.array(data)

How do I filter for example only return elements where isActive==True?

Unlike pandas, numpy don't support syntax like data[data.isActive==True]

I am looking for numpy syntax, and not look for a solution where you convert above 'data' to simple python list (then try list comprehension) or convert to pandas.

Thanks

答案1

得分: 1

根据评论建议，您可以使用记录数组。但是，NumPy可能不是合适的工具。

rec = np.core.records.fromrecords(
    [tuple(d.values()) for d in data],
    names=list(data[0].keys()),
    formats=[np.dtype('<U16'), np.dtype('<U16'), int, bool, float],
)

>>> rec[rec.isActive == True]
rec.array([('111', 'AAPL.OQ',  100,  True,  nan),
           ('111', 'IBM.N',  300,  True,  inf),
           ('222', 'MSFT.OQ', 2000,  True,  nan),
           ('222', 'IBM.N', 3000,  True,  59.)],
          dtype=[('Account', '<U16'), ('RIC', '<U16'), ('Position', '<i8'), ('isActive', '?'), ('Rating', '<f8')])

英文:

As suggested in the comment you can use a record array. Still numpy might not be the right tool.

rec = np.core.records.fromrecords(
    [tuple(d.values()) for d in data],
    names=list(data[0].keys()),
    formats=[np.dtype(&quot;&lt;U16&quot;), np.dtype(&quot;&lt;U16&quot;), int, bool, float],
)

&gt;&gt;&gt; rec[rec.isActive == True]
rec.array([(&#39;111&#39;, &#39;AAPL.OQ&#39;,  100,  True, nan),
           (&#39;111&#39;, &#39;IBM.N&#39;,  300,  True, inf),
           (&#39;222&#39;, &#39;MSFT.OQ&#39;, 2000,  True, nan),
           (&#39;222&#39;, &#39;IBM.N&#39;, 3000,  True, 59.)],
          dtype=[(&#39;Account&#39;, &#39;&lt;U16&#39;), (&#39;RIC&#39;, &#39;&lt;U16&#39;), (&#39;Position&#39;, &#39;&lt;i8&#39;), (&#39;isActive&#39;, &#39;?&#39;), (&#39;Rating&#39;, &#39;&lt;f8&#39;)])

答案2

得分: 0

你的数组是object数据类型：

array([{'Account': '111', 'RIC': 'AAPL.OQ', 'Position': 100, 'isActive': True, 'Rating': nan},
       {'Account': '111', 'RIC': 'MSFT.OQ', 'Position': 200, 'isActive': False, 'Rating': 73},
       {'Account': '111', 'RIC': 'IBM.N', 'Position': 300, 'isActive': True, 'Rating': inf},
       {'Account': '222', 'RIC': 'AAPL.OQ', 'Position': 1000, 'isActive': False, 'Rating': 89},
       {'Account': '222', 'RIC': 'MSFT.OQ', 'Position': 2000, 'isActive': True, 'Rating': nan},
       {'Account': '222', 'RIC': 'IBM.N', 'Position': 3000, 'isActive': True, 'Rating': 59}],
      dtype=object)

在这样的数组中，每个元素都是对Python对象的引用，这种情况下是字典。访问方式与列表相同（但稍微慢一些）：

使用列表推导式：

[elem['isActive'] for elem in arr]

结果：

[True, False, True, False, True, True]

我们可以使用nonzero函数获取非零元素的索引（或使用另一个列表推导式）：

np.nonzero(_)

结果：

(array([0, 2, 4, 5]),)

我们还可以构建一个"向量化"函数来进行选择：

np.frompyfunc(lambda x: x.__getitem__('isActive'), 1, 1)(arr)

结果：

array([True, False, True, False, True, True], dtype=object)

对于小数组，列表推导式更快；对于大数组，frompyfunc方法可能稍微具有一定的性能优势。

Pandas为每个字典键构建了单独的数组、Series/列，并允许按列名进行选择。

直接获取索引：

[i for i, v in enumerate(arr) if v['isActive']]

结果：

[0, 2, 4, 5]

英文:

Your array is object dtype:

In [239]: arr
Out[239]: 
array([{&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;AAPL.OQ&#39;, &#39;Position&#39;: 100, &#39;isActive&#39;: True, &#39;Rating&#39;: nan},
       {&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;MSFT.OQ&#39;, &#39;Position&#39;: 200, &#39;isActive&#39;: False, &#39;Rating&#39;: 73},
       {&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;IBM.N&#39;, &#39;Position&#39;: 300, &#39;isActive&#39;: True, &#39;Rating&#39;: inf},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;AAPL.OQ&#39;, &#39;Position&#39;: 1000, &#39;isActive&#39;: False, &#39;Rating&#39;: 89},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;MSFT.OQ&#39;, &#39;Position&#39;: 2000, &#39;isActive&#39;: True, &#39;Rating&#39;: nan},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;IBM.N&#39;, &#39;Position&#39;: 3000, &#39;isActive&#39;: True, &#39;Rating&#39;: 59}],
      dtype=object)

In such an array, each element is a reference to python object, in this case, dicts. That's the same data layout as for a list, and access is basically same (but a bit slower):

A list comprehension:

In [240]: [a[&#39;isActive&#39;] for a in arr]
Out[240]: [True, False, True, False, True, True]

and we can get indices for nonzero (or another list comprehension):

In [241]: np.nonzero(_)
Out[241]: (array([0, 2, 4, 5]),)

We can also construct a "vectorized" function to do this selection:

In [247]: np.frompyfunc(lambda x: x.__getitem__(&#39;isActive&#39;),1,1)(arr)
Out[247]: array([True, False, True, False, True, True], dtype=object)

For small arrays, list comprehension is faster; for large ones the frompyfunc approach may have a minor scaling advantage.

pandas constructs a separate array, Series/column, for each dict key, and allows selection by column names.

Getting the indices directly:

In [251]: [i for i,v in enumerate(arr) if v[&#39;isActive&#39;]]
Out[251]: [0, 2, 4, 5]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

numpy与字典列表一起使用：筛选元素的语法？

问题

答案1

答案2

使用Python从坐标中提取区域

Accessing C pointers to vertices in Blender’s Python API.

在使用pyautogui和keyboard库进行Python循环时出现问题。

无法在使用Mutagen添加封面后播放MP3歌曲。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论