识别基于条件的行,并选择上方一行和下方一行。

huangapple go评论75阅读模式
英文:

Identify rows based on a condition and select one above and one below

问题

我想选择上面具有L = 0的一行和下面具有L = 0的一行,在具有L = 1的行之前,按ID和P分组。这些行在图像中以红色字体显示。如果有多行具有L = 1,则适用相同规则。有什么建议吗?谢谢。

识别基于条件的行,并选择上方一行和下方一行。

英文:

I have the below dataframe:

ID P L Score
1 1 0 5
1 1 1
1 1 0 7
1 2 0 10
1 2 1
1 2 0 8
1 2 1 5
1 2 0 7
1 2 1
1 2 1
1 2 0 8
2 1 0 9
2 1 0 9
2 1 0 10
2 1 1
2 1 0 7
2 1 1

I would like to select one row with L = 0 above and one row with L = 0 below the rows with L = 1, groupby ID and P. These rows are in red font in the image. If there are multiple rows with L = 1, the same rule applies (that is one row below and one row above). Any suggestion? Thank you.

识别基于条件的行,并选择上方一行和下方一行。

答案1

得分: 2

你可以使用 DataFrameGroupBy.shift 来与实际值比较,并使用 | 连接运算符进行位或操作,使用 & 连接运算符进行位与操作,并在 boolean indexing 中进行过滤:

m11 = df['L'].eq(1)
m22 = df['L'].eq(0)

shifted1 = df.groupby(['ID','P'])['L'].shift(-1)
shifted2 = df.groupby(['ID','P'])['L'].shift()

m1 = m11 & shifted1.eq(0)
m2 = m22 & shifted2.eq(1)

m3 = m22 & shifted1.eq(1)
m4 = m11 & shifted2.eq(0)

out = df[(m1 | m2 | m3 | m4) & m22]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

另一种方法是将 0 替换为缺失值,然后使用 GroupBy.bfillGroupBy.ffill 并设置参数 limit

g = df.assign(L = df['L'].mask(df['L'].eq(0))).groupby(['ID','P'])['L']
m22 = df['L'].eq(0)

out = df[(g.bfill(limit=1).eq(1) & m22) | (g.ffill(limit=1).eq(1) & m22)]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0
英文:

You can compare by DataFrameGroupBy.shift with actual value and chain mask by | for bitwise OR with & for bitwise AND and filter in boolean indexing:

m11 = df['L'].eq(1)
m22 = df['L'].eq(0)

shifted1 = df.groupby(['ID','P'])['L'].shift(-1)
shifted2 = df.groupby(['ID','P'])['L'].shift()

m1 = m11 & shifted1.eq(0) 
m2 = m22 & shifted2.eq(1)

m3 = m22 & shifted1.eq(1) 
m4 = m11 & shifted2.eq(0)

out = df[(m1 | m2 | m3 | m4) & m22]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

Another idea is replace 0 to missing values and use GroupBy.bfill with GroupBy.ffill and parameter limit:

g = df.assign(L = df['L'].mask(df['L'].eq(0))).groupby(['ID','P'])['L']
m22 = df['L'].eq(0)

out = df[(g.bfill(limit=1).eq(1) & m22) | (g.ffill(limit=1).eq(1) & m22) ]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

答案2

得分: 1

你可以使用一个简单的 groupby.shift 来进行布尔索引

m = df['L'].eq(1)
g = m.groupby([df['ID'], df['P']])

m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))

out = df[m2&~m]

输出结果:

    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

中间结果:

    ID  P  L  Score      m  shift  shift(-1)     m2  m2&~m
0    1  1  0    5.0  False  False       True   True   True
1    1  1  1    NaN   True  False      False  False  False
2    1  1  0    7.0  False   True      False   True   True
3    1  2  0   10.0  False  False       True   True   True
4    1  2  1    NaN   True  False      False  False  False
5    1  2  0    8.0  False   True       True   True   True
6    1  2  1    5.0   True  False      False  False  False
7    1  2  0    7.0  False   True       True   True   True
8    1  2  1    NaN   True  False       True   True  False
9    1  2  1    NaN   True   True      False   True  False
10   1  2  0    8.0  False   True      False   True   True
11   2  1  0    9.0  False  False      False  False  False
12   2  1  0    9.0  False  False      False  False  False
13   2  1  0   10.0  False  False       True   True   True
14   2  1  1    NaN   True  False      False  False  False
15   2  1  0    7.0  False   True       True   True   True
16   2  1  1    NaN   True  False      False  False  False
英文:

You can use a simple double groupby.shift for boolean indexing:

m = df['L'].eq(1)
g = m.groupby([df['ID'], df['P']])

m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))

out = df[m2&~m]

Output:

    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

Intermediates:

    ID  P  L  Score      m  shift  shift(-1)     m2  m2&~m
0    1  1  0    5.0  False  False       True   True   True
1    1  1  1    NaN   True  False      False  False  False
2    1  1  0    7.0  False   True      False   True   True
3    1  2  0   10.0  False  False       True   True   True
4    1  2  1    NaN   True  False      False  False  False
5    1  2  0    8.0  False   True       True   True   True
6    1  2  1    5.0   True  False      False  False  False
7    1  2  0    7.0  False   True       True   True   True
8    1  2  1    NaN   True  False       True   True  False
9    1  2  1    NaN   True   True      False   True  False
10   1  2  0    8.0  False   True      False   True   True
11   2  1  0    9.0  False  False      False  False  False
12   2  1  0    9.0  False  False      False  False  False
13   2  1  0   10.0  False  False       True   True   True
14   2  1  1    NaN   True  False      False  False  False
15   2  1  0    7.0  False   True       True   True   True
16   2  1  1    NaN   True  False      False  False  False

huangapple
  • 本文由 发表于 2023年7月3日 14:39:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76602376.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定