2023年3月3日 22:44:02go评论70阅读模式

英文:

How to groupby numpy ndarray and return first row from each group. Now sort before

问题

import numpy as np

l1 = [1, 0, 0, 1, 1, 0, 1]
l2 = [1, 2, 3, 4, 5, 6, 7]
a = np.array([l1, l2]).T

arr3 = np.array([np.NaN])
arr4 = np.array(a[:-1, 0])
arr5 = np.concatenate([arr3, arr4])

a = np.c_[a, arr5]
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
a = np.c_[a, dif_col]
mask = (a[:, 3] == True)
a = a[mask, :]
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)

英文:

I have ndarray:

[[1 1]
 [0 2]
 [0 3]
 [1 4]
 [1 5]
 [0 6]
 [1 7]]

I expect reduced result like that:

[[1 1]
  [0 2]
  [1 4]
  [0 6]
  [1 7]]

Result ndarray should contain first row from each group.
I build a groups on values from column 0. This is values 0 or 1.

Similar problem was resolved in thread: Is there any numpy group by function?
But there key was sorted and in my case it does not work.

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T
print(a)
values, indexes = np.unique(a[:, 0], return_index=True)

In pandas we can achieve this by (solution from stack, but i do not remember owner, sorry for no link):

m1 = ( df[&#39;c0&#39;] != df[&#39;c0&#39;].shift(1)).cumsum()
df = df.groupby([df[&#39;c0&#39;], m1]).head(1)

How to make it with numpy?

Thank you for solutions.

EDITED:

At the time when mozway wrote solution i created something like that:

import numpy as np

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T

print(&quot;solution&quot;)
&quot;shift for numpy&quot;
arr3 = np.array([np.NaN])
arr4 = np.array(a[ :-1, 0])
arr5 = np.concatenate([arr3, arr4])
print(&#39;arr5&#39;)
print(arr5)
&quot;add shifted column&quot;
a = np.c_[ a, arr5 ]

&quot;diff between column 0 and shofted colum&quot;
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
&quot;add diff column&quot;
a = np.c_[ a, dif_col ]
&quot;select only true&quot;
mask = (a[:, 3] == True)
a = a[mask, :]
&quot;remove unnecessary redundant columns &quot;
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)

Output:

[[1. 1.]
 [0. 2.]
 [1. 4.]
 [0. 6.]
 [1. 7.]]

What do you think?

答案1

得分: 4

你可以计算数值变化的索引：

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = a[np.r_[0, idx + 1]]

输出：

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

每组的最小值

我最初误解了，认为你想要每组的最小值，你需要结合使用 np.minimum.reduceat：

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)

示例：

l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)

array([[1, 0],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

每组排序

使用 lexsort：

group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

示例：

l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T

group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])

out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

array([[1, 0],
       [1, 1],
       [0, 2],
       [0, 3],
       [1, 4],
       [1, 5],
       [0, 6],
       [1, 7]])

英文:

You can compute the indices where the value changes:

idx = np.where(np.diff(a[:, 0])!=0)[0]

out = a[np.r_[0, idx+1]]

Output:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

minimum per group

I initially misread and thought you wanted the minimum per group, you would need to combine to np.minimum.reduceat:

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

Example:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

array([[1, 0],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

sorting per group

Using lexsort:

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

Example:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])

out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

array([[1, 0],
       [1, 1],
       [0, 2],
       [0, 3],
       [1, 4],
       [1, 5],
       [0, 6],
       [1, 7]])

答案2

得分: 1

另一种可能的解决方案，基于 numpy.roll:

m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]

输出:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

英文:

Another possible solution, which is based on numpy.roll:

m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]

Output:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to groupby numpy ndarray and return first row from each group. Now sort before

问题

答案1

每组的最小值

每组排序

minimum per group

sorting per group

答案2

在pandas滚动窗口中的比较操作

如何在Beanstalkd中使用回调函数？

如何防止将numpy ndarray列转换为字符串，当将Pandas DataFrame保存为csv时？

用matplotlib填充线下特定区域

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论