英文:
How to groupby numpy ndarray and return first row from each group. Now sort before
问题
import numpy as np
l1 = [1, 0, 0, 1, 1, 0, 1]
l2 = [1, 2, 3, 4, 5, 6, 7]
a = np.array([l1, l2]).T
arr3 = np.array([np.NaN])
arr4 = np.array(a[:-1, 0])
arr5 = np.concatenate([arr3, arr4])
a = np.c_[a, arr5]
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
a = np.c_[a, dif_col]
mask = (a[:, 3] == True)
a = a[mask, :]
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)
英文:
I have ndarray:
[[1 1]
[0 2]
[0 3]
[1 4]
[1 5]
[0 6]
[1 7]]
I expect reduced result like that:
[[1 1]
[0 2]
[1 4]
[0 6]
[1 7]]
Result ndarray should contain first row from each group.
I build a groups on values from column 0. This is values 0 or 1.
Similar problem was resolved in thread: Is there any numpy group by function?
But there key was sorted and in my case it does not work.
l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T
print(a)
values, indexes = np.unique(a[:, 0], return_index=True)
In pandas we can achieve this by (solution from stack, but i do not remember owner, sorry for no link):
m1 = ( df['c0'] != df['c0'].shift(1)).cumsum()
df = df.groupby([df['c0'], m1]).head(1)
How to make it with numpy?
Thank you for solutions.
EDITED:
At the time when mozway wrote solution i created something like that:
import numpy as np
l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T
print("solution")
"shift for numpy"
arr3 = np.array([np.NaN])
arr4 = np.array(a[ :-1, 0])
arr5 = np.concatenate([arr3, arr4])
print('arr5')
print(arr5)
"add shifted column"
a = np.c_[ a, arr5 ]
"diff between column 0 and shofted colum"
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
"add diff column"
a = np.c_[ a, dif_col ]
"select only true"
mask = (a[:, 3] == True)
a = a[mask, :]
"remove unnecessary redundant columns "
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)
Output:
[[1. 1.]
[0. 2.]
[1. 4.]
[0. 6.]
[1. 7.]]
What do you think?
答案1
得分: 4
你可以计算数值变化的索引:
idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = a[np.r_[0, idx + 1]]
输出:
array([[1, 1],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
每组的最小值
我最初误解了,认为你想要每组的最小值,你需要结合使用 np.minimum.reduceat
:
idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)
示例:
l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T
idx = np.where(np.diff(a[:, 0]) != 0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx + 1], axis=0)
array([[1, 0],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
每组排序
使用 lexsort
:
group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]
示例:
l1 = [1, 1, 0, 0, 1, 1, 0, 1]
l2 = [1, 0, 3, 2, 4, 5, 6, 7]
a = np.array([l1, l2]).T
group = np.r_[0, np.cumsum(np.diff(a[:, 0]) != 0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]
array([[1, 0],
[1, 1],
[0, 2],
[0, 3],
[1, 4],
[1, 5],
[0, 6],
[1, 7]])
英文:
You can compute the indices where the value changes:
idx = np.where(np.diff(a[:, 0])!=0)[0]
out = a[np.r_[0, idx+1]]
Output:
array([[1, 1],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
minimum per group
I initially misread and thought you wanted the minimum per group, you would need to combine to np.minimum.reduceat
:
idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)
Example:
l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T
idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)
array([[1, 0],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
sorting per group
Using lexsort
:
group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]
Example:
l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T
group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]
array([[1, 0],
[1, 1],
[0, 2],
[0, 3],
[1, 4],
[1, 5],
[0, 6],
[1, 7]])
答案2
得分: 1
另一种可能的解决方案,基于 numpy.roll
:
m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]
输出:
array([[1, 1],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
英文:
Another possible solution, which is based on numpy.roll
:
m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]
Output:
array([[1, 1],
[0, 2],
[1, 4],
[0, 6],
[1, 7]])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论