numpy与字典列表一起使用:筛选元素的语法?

huangapple go评论69阅读模式
英文:

numpy with a list of Dict: Syntax to filter elements?

问题

我要筛选只返回isActive为True的元素,以下是使用NumPy的语法:

filtered_data = data[data['isActive'] == True]

请注意,你可以将True简化为True,因此上面的语句也可以写成:

filtered_data = data[data['isActive']]

这将返回一个包含满足条件的元素的NumPy数组。

英文:

Say I have a numpy list with each elements a Dict

data = [
    {
        'Account' : '111',
        'RIC' : 'AAPL.OQ',
        'Position' : 100,
        'isActive' : True,
        'Rating' : math.nan
    },
    {
        'Account' : '111',
        'RIC' : 'MSFT.OQ',
        'Position' : 200,
        'isActive' : False,
        'Rating' : 73
    },
    {
        'Account' : '111',
        'RIC' : 'IBM.N',
        'Position' : 300,
        'isActive' : True,
        'Rating' : math.inf
    },

    {
        'Account' : '222',
        'RIC' : 'AAPL.OQ',
        'Position' : 1000,
        'isActive' : False,
        'Rating' : 89
    },
    {
        'Account' : '222',
        'RIC' : 'MSFT.OQ',
        'Position' : 2000,
        'isActive' : True,
        'Rating' : np.nan
    },
    {
        'Account' : '222',
        'RIC' : 'IBM.N',
        'Position' : 3000,
        'isActive' : True,
        'Rating' : 59
    }
]
data = np.array(data)

How do I filter for example only return elements where isActive==True?

Unlike pandas, numpy don't support syntax like data[data.isActive==True]

I am looking for numpy syntax, and not look for a solution where you convert above 'data' to simple python list (then try list comprehension) or convert to pandas.

Thanks

答案1

得分: 1

根据评论建议,您可以使用记录数组。但是,NumPy可能不是合适的工具。

rec = np.core.records.fromrecords(
    [tuple(d.values()) for d in data],
    names=list(data[0].keys()),
    formats=[np.dtype('<U16'), np.dtype('<U16'), int, bool, float],
)
>>> rec[rec.isActive == True]
rec.array([('111', 'AAPL.OQ',  100,  True,  nan),
           ('111', 'IBM.N',  300,  True,  inf),
           ('222', 'MSFT.OQ', 2000,  True,  nan),
           ('222', 'IBM.N', 3000,  True,  59.)],
          dtype=[('Account', '<U16'), ('RIC', '<U16'), ('Position', '<i8'), ('isActive', '?'), ('Rating', '<f8')])
英文:

As suggested in the comment you can use a record array. Still numpy might not be the right tool.

rec = np.core.records.fromrecords(
    [tuple(d.values()) for d in data],
    names=list(data[0].keys()),
    formats=[np.dtype(&quot;&lt;U16&quot;), np.dtype(&quot;&lt;U16&quot;), int, bool, float],
)
&gt;&gt;&gt; rec[rec.isActive == True]
rec.array([(&#39;111&#39;, &#39;AAPL.OQ&#39;,  100,  True, nan),
           (&#39;111&#39;, &#39;IBM.N&#39;,  300,  True, inf),
           (&#39;222&#39;, &#39;MSFT.OQ&#39;, 2000,  True, nan),
           (&#39;222&#39;, &#39;IBM.N&#39;, 3000,  True, 59.)],
          dtype=[(&#39;Account&#39;, &#39;&lt;U16&#39;), (&#39;RIC&#39;, &#39;&lt;U16&#39;), (&#39;Position&#39;, &#39;&lt;i8&#39;), (&#39;isActive&#39;, &#39;?&#39;), (&#39;Rating&#39;, &#39;&lt;f8&#39;)])

答案2

得分: 0

你的数组是object数据类型:

array([{'Account': '111', 'RIC': 'AAPL.OQ', 'Position': 100, 'isActive': True, 'Rating': nan},
       {'Account': '111', 'RIC': 'MSFT.OQ', 'Position': 200, 'isActive': False, 'Rating': 73},
       {'Account': '111', 'RIC': 'IBM.N', 'Position': 300, 'isActive': True, 'Rating': inf},
       {'Account': '222', 'RIC': 'AAPL.OQ', 'Position': 1000, 'isActive': False, 'Rating': 89},
       {'Account': '222', 'RIC': 'MSFT.OQ', 'Position': 2000, 'isActive': True, 'Rating': nan},
       {'Account': '222', 'RIC': 'IBM.N', 'Position': 3000, 'isActive': True, 'Rating': 59}],
      dtype=object)

在这样的数组中,每个元素都是对Python对象的引用,这种情况下是字典。访问方式与列表相同(但稍微慢一些):

使用列表推导式:

[elem['isActive'] for elem in arr]

结果:

[True, False, True, False, True, True]

我们可以使用nonzero函数获取非零元素的索引(或使用另一个列表推导式):

np.nonzero(_)

结果:

(array([0, 2, 4, 5]),)

我们还可以构建一个"向量化"函数来进行选择:

np.frompyfunc(lambda x: x.__getitem__('isActive'), 1, 1)(arr)

结果:

array([True, False, True, False, True, True], dtype=object)

对于小数组,列表推导式更快;对于大数组,frompyfunc方法可能稍微具有一定的性能优势。

Pandas为每个字典键构建了单独的数组、Series/列,并允许按列名进行选择。

直接获取索引:

[i for i, v in enumerate(arr) if v['isActive']]

结果:

[0, 2, 4, 5]
英文:

Your array is object dtype:

In [239]: arr
Out[239]: 
array([{&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;AAPL.OQ&#39;, &#39;Position&#39;: 100, &#39;isActive&#39;: True, &#39;Rating&#39;: nan},
       {&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;MSFT.OQ&#39;, &#39;Position&#39;: 200, &#39;isActive&#39;: False, &#39;Rating&#39;: 73},
       {&#39;Account&#39;: &#39;111&#39;, &#39;RIC&#39;: &#39;IBM.N&#39;, &#39;Position&#39;: 300, &#39;isActive&#39;: True, &#39;Rating&#39;: inf},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;AAPL.OQ&#39;, &#39;Position&#39;: 1000, &#39;isActive&#39;: False, &#39;Rating&#39;: 89},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;MSFT.OQ&#39;, &#39;Position&#39;: 2000, &#39;isActive&#39;: True, &#39;Rating&#39;: nan},
       {&#39;Account&#39;: &#39;222&#39;, &#39;RIC&#39;: &#39;IBM.N&#39;, &#39;Position&#39;: 3000, &#39;isActive&#39;: True, &#39;Rating&#39;: 59}],
      dtype=object)

In such an array, each element is a reference to python object, in this case, dicts. That's the same data layout as for a list, and access is basically same (but a bit slower):

A list comprehension:

In [240]: [a[&#39;isActive&#39;] for a in arr]
Out[240]: [True, False, True, False, True, True]

and we can get indices for nonzero (or another list comprehension):

In [241]: np.nonzero(_)
Out[241]: (array([0, 2, 4, 5]),)

We can also construct a "vectorized" function to do this selection:

In [247]: np.frompyfunc(lambda x: x.__getitem__(&#39;isActive&#39;),1,1)(arr)
Out[247]: array([True, False, True, False, True, True], dtype=object)

For small arrays, list comprehension is faster; for large ones the frompyfunc approach may have a minor scaling advantage.

pandas constructs a separate array, Series/column, for each dict key, and allows selection by column names.

Getting the indices directly:

In [251]: [i for i,v in enumerate(arr) if v[&#39;isActive&#39;]]
Out[251]: [0, 2, 4, 5]

huangapple
  • 本文由 发表于 2023年7月10日 14:36:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76651201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定