英文:
Identify rows based on a condition and select one above and one below
问题
我想选择上面具有L = 0的一行和下面具有L = 0的一行,在具有L = 1的行之前,按ID和P分组。这些行在图像中以红色字体显示。如果有多行具有L = 1,则适用相同规则。有什么建议吗?谢谢。
英文:
I have the below dataframe:
ID | P | L | Score |
---|---|---|---|
1 | 1 | 0 | 5 |
1 | 1 | 1 | |
1 | 1 | 0 | 7 |
1 | 2 | 0 | 10 |
1 | 2 | 1 | |
1 | 2 | 0 | 8 |
1 | 2 | 1 | 5 |
1 | 2 | 0 | 7 |
1 | 2 | 1 | |
1 | 2 | 1 | |
1 | 2 | 0 | 8 |
2 | 1 | 0 | 9 |
2 | 1 | 0 | 9 |
2 | 1 | 0 | 10 |
2 | 1 | 1 | |
2 | 1 | 0 | 7 |
2 | 1 | 1 |
I would like to select one row with L = 0 above and one row with L = 0 below the rows with L = 1, groupby ID and P. These rows are in red font in the image. If there are multiple rows with L = 1, the same rule applies (that is one row below and one row above). Any suggestion? Thank you.
答案1
得分: 2
你可以使用 DataFrameGroupBy.shift
来与实际值比较,并使用 |
连接运算符进行位或操作,使用 &
连接运算符进行位与操作,并在 boolean indexing
中进行过滤:
m11 = df['L'].eq(1)
m22 = df['L'].eq(0)
shifted1 = df.groupby(['ID','P'])['L'].shift(-1)
shifted2 = df.groupby(['ID','P'])['L'].shift()
m1 = m11 & shifted1.eq(0)
m2 = m22 & shifted2.eq(1)
m3 = m22 & shifted1.eq(1)
m4 = m11 & shifted2.eq(0)
out = df[(m1 | m2 | m3 | m4) & m22]
print (out)
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
另一种方法是将 0
替换为缺失值,然后使用 GroupBy.bfill
与 GroupBy.ffill
并设置参数 limit
:
g = df.assign(L = df['L'].mask(df['L'].eq(0))).groupby(['ID','P'])['L']
m22 = df['L'].eq(0)
out = df[(g.bfill(limit=1).eq(1) & m22) | (g.ffill(limit=1).eq(1) & m22)]
print (out)
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
英文:
You can compare by DataFrameGroupBy.shift
with actual value and chain mask by |
for bitwise OR
with &
for bitwise AND
and filter in boolean indexing
:
m11 = df['L'].eq(1)
m22 = df['L'].eq(0)
shifted1 = df.groupby(['ID','P'])['L'].shift(-1)
shifted2 = df.groupby(['ID','P'])['L'].shift()
m1 = m11 & shifted1.eq(0)
m2 = m22 & shifted2.eq(1)
m3 = m22 & shifted1.eq(1)
m4 = m11 & shifted2.eq(0)
out = df[(m1 | m2 | m3 | m4) & m22]
print (out)
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
Another idea is replace 0
to missing values and use GroupBy.bfill
with GroupBy.ffill
and parameter limit
:
g = df.assign(L = df['L'].mask(df['L'].eq(0))).groupby(['ID','P'])['L']
m22 = df['L'].eq(0)
out = df[(g.bfill(limit=1).eq(1) & m22) | (g.ffill(limit=1).eq(1) & m22) ]
print (out)
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
答案2
得分: 1
你可以使用一个简单的 groupby.shift
来进行布尔索引:
m = df['L'].eq(1)
g = m.groupby([df['ID'], df['P']])
m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))
out = df[m2&~m]
输出结果:
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
中间结果:
ID P L Score m shift shift(-1) m2 m2&~m
0 1 1 0 5.0 False False True True True
1 1 1 1 NaN True False False False False
2 1 1 0 7.0 False True False True True
3 1 2 0 10.0 False False True True True
4 1 2 1 NaN True False False False False
5 1 2 0 8.0 False True True True True
6 1 2 1 5.0 True False False False False
7 1 2 0 7.0 False True True True True
8 1 2 1 NaN True False True True False
9 1 2 1 NaN True True False True False
10 1 2 0 8.0 False True False True True
11 2 1 0 9.0 False False False False False
12 2 1 0 9.0 False False False False False
13 2 1 0 10.0 False False True True True
14 2 1 1 NaN True False False False False
15 2 1 0 7.0 False True True True True
16 2 1 1 NaN True False False False False
英文:
You can use a simple double groupby.shift
for boolean indexing:
m = df['L'].eq(1)
g = m.groupby([df['ID'], df['P']])
m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))
out = df[m2&~m]
Output:
ID P L Score
0 1 1 0 5.0
2 1 1 0 7.0
3 1 2 0 10.0
5 1 2 0 8.0
7 1 2 0 7.0
10 1 2 0 8.0
13 2 1 0 10.0
15 2 1 0 7.0
Intermediates:
ID P L Score m shift shift(-1) m2 m2&~m
0 1 1 0 5.0 False False True True True
1 1 1 1 NaN True False False False False
2 1 1 0 7.0 False True False True True
3 1 2 0 10.0 False False True True True
4 1 2 1 NaN True False False False False
5 1 2 0 8.0 False True True True True
6 1 2 1 5.0 True False False False False
7 1 2 0 7.0 False True True True True
8 1 2 1 NaN True False True True False
9 1 2 1 NaN True True False True False
10 1 2 0 8.0 False True False True True
11 2 1 0 9.0 False False False False False
12 2 1 0 9.0 False False False False False
13 2 1 0 10.0 False False True True True
14 2 1 1 NaN True False False False False
15 2 1 0 7.0 False True True True True
16 2 1 1 NaN True False False False False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论