How to groupby numpy ndarray and return first row from each group. Now sort before

huangapple go评论61阅读模式
英文:

How to groupby numpy ndarray and return first row from each group. Now sort before

问题

import numpy as np

l1 = [1, 0, 0, 1, 1, 0, 1]
l2 = [1, 2, 3, 4, 5, 6, 7]
a = np.array([l1, l2]).T

arr3 = np.array([np.NaN])
arr4 = np.array(a[:-1, 0])
arr5 = np.concatenate([arr3, arr4])

a = np.c_[a, arr5]
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
a = np.c_[a, dif_col]
mask = (a[:, 3] == True)
a = a[mask, :]
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)
英文:

I have ndarray:

[[1 1]
 [0 2]
 [0 3]
 [1 4]
 [1 5]
 [0 6]
 [1 7]]

I expect reduced result like that:

[[1 1]
  [0 2]
  [1 4]
  [0 6]
  [1 7]]

Result ndarray should contain first row from each group.
I build a groups on values from column 0. This is values 0 or 1.

Similar problem was resolved in thread: Is there any numpy group by function?
But there key was sorted and in my case it does not work.

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T
print(a)
values, indexes = np.unique(a[:, 0], return_index=True)

In pandas we can achieve this by (solution from stack, but i do not remember owner, sorry for no link):

m1 = ( df['c0'] != df['c0'].shift(1)).cumsum()
df = df.groupby([df['c0'], m1]).head(1)

How to make it with numpy?

Thank you for solutions.

EDITED:

At the time when mozway wrote solution i created something like that:

import numpy as np

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T

print("solution")
"shift for numpy"
arr3 = np.array([np.NaN])
arr4 = np.array(a[ :-1, 0])
arr5 = np.concatenate([arr3, arr4])
print('arr5')
print(arr5)
"add shifted column"
a = np.c_[ a, arr5 ]

"diff between column 0 and shofted colum"
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
"add diff column"
a = np.c_[ a, dif_col ]
"select only true"
mask = (a[:, 3] == True)
a = a[mask, :]
"remove unnecessary redundant columns "
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)

Output:

[[1. 1.]
 [0. 2.]
 [1. 4.]
 [0. 6.]
 [1. 7.]]

What do you think?

答案1

得分: 4

你可以计算数值变化的索引:

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = a[np.r_[0, idx + 1]]

输出:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

每组的最小值

我最初误解了,认为你想要每组的最小值,你需要结合使用 np.minimum.reduceat

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)

示例:

l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T

idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)

array([[1, 0],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

每组排序

使用 lexsort

group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

示例:

l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T

group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])

out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

array([[1, 0],
       [1, 1],
       [0, 2],
       [0, 3],
       [1, 4],
       [1, 5],
       [0, 6],
       [1, 7]])
英文:

You can compute the indices where the value changes:

idx = np.where(np.diff(a[:, 0])!=0)[0]

out = a[np.r_[0, idx+1]]

Output:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

minimum per group

I initially misread and thought you wanted the minimum per group, you would need to combine to np.minimum.reduceat:

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

Example:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

array([[1, 0],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

sorting per group

Using lexsort:

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

Example:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])

out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

array([[1, 0],
       [1, 1],
       [0, 2],
       [0, 3],
       [1, 4],
       [1, 5],
       [0, 6],
       [1, 7]])

答案2

得分: 1

另一种可能的解决方案,基于 numpy.roll:

m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]

输出:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])
英文:

Another possible solution, which is based on numpy.roll:

m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]

Output:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

huangapple
  • 本文由 发表于 2023年3月3日 22:44:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定