2023年7月3日 14:39:42go评论106阅读模式

英文:

Identify rows based on a condition and select one above and one below

问题

我想选择上面具有L = 0的一行和下面具有L = 0的一行，在具有L = 1的行之前，按ID和P分组。这些行在图像中以红色字体显示。如果有多行具有L = 1，则适用相同规则。有什么建议吗？谢谢。

英文:

I have the below dataframe:

ID	P	L	Score
1	1	0	5
1	1	1
1	1	0	7
1	2	0	10
1	2	1
1	2	0	8
1	2	1	5
1	2	0	7
1	2	1
1	2	1
1	2	0	8
2	1	0	9
2	1	0	9
2	1	0	10
2	1	1
2	1	0	7
2	1	1

I would like to select one row with L = 0 above and one row with L = 0 below the rows with L = 1, groupby ID and P. These rows are in red font in the image. If there are multiple rows with L = 1, the same rule applies (that is one row below and one row above). Any suggestion? Thank you.

答案1

得分: 2

你可以使用 DataFrameGroupBy.shift 来与实际值比较，并使用 | 连接运算符进行位或操作，使用 & 连接运算符进行位与操作，并在 boolean indexing 中进行过滤：

m11 = df['L'].eq(1)
m22 = df['L'].eq(0)
shifted1 = df.groupby(['ID','P'])['L'].shift(-1)
shifted2 = df.groupby(['ID','P'])['L'].shift()
m1 = m11 & shifted1.eq(0)
m2 = m22 & shifted2.eq(1)
m3 = m22 & shifted1.eq(1)
m4 = m11 & shifted2.eq(0)
out = df[(m1 | m2 | m3 | m4) & m22]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

另一种方法是将 0 替换为缺失值，然后使用 GroupBy.bfill 与 GroupBy.ffill 并设置参数 limit：

g = df.assign(L = df['L'].mask(df['L'].eq(0))).groupby(['ID','P'])['L']
m22 = df['L'].eq(0)
out = df[(g.bfill(limit=1).eq(1) & m22) | (g.ffill(limit=1).eq(1) & m22)]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

英文:

You can compare by DataFrameGroupBy.shift with actual value and chain mask by | for bitwise OR with & for bitwise AND and filter in boolean indexing:

m11 = df[&#39;L&#39;].eq(1)
m22 = df[&#39;L&#39;].eq(0)
shifted1 = df.groupby([&#39;ID&#39;,&#39;P&#39;])[&#39;L&#39;].shift(-1)
shifted2 = df.groupby([&#39;ID&#39;,&#39;P&#39;])[&#39;L&#39;].shift()
m1 = m11 &amp; shifted1.eq(0) 
m2 = m22 &amp; shifted2.eq(1)
m3 = m22 &amp; shifted1.eq(1) 
m4 = m11 &amp; shifted2.eq(0)
out = df[(m1 | m2 | m3 | m4) &amp; m22]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

Another idea is replace 0 to missing values and use GroupBy.bfill with GroupBy.ffill and parameter limit:

g = df.assign(L = df[&#39;L&#39;].mask(df[&#39;L&#39;].eq(0))).groupby([&#39;ID&#39;,&#39;P&#39;])[&#39;L&#39;]
m22 = df[&#39;L&#39;].eq(0)
out = df[(g.bfill(limit=1).eq(1) &amp; m22) | (g.ffill(limit=1).eq(1) &amp; m22) ]
print (out)
    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

答案2

得分: 1

你可以使用一个简单的 groupby.shift 来进行布尔索引：

m = df['L'].eq(1)
g = m.groupby([df['ID'], df['P']])
m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))
out = df[m2&~m]

输出结果：

    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

中间结果：

    ID  P  L  Score      m  shift  shift(-1)     m2  m2&~m
0    1  1  0    5.0  False  False       True   True   True
1    1  1  1    NaN   True  False      False  False  False
2    1  1  0    7.0  False   True      False   True   True
3    1  2  0   10.0  False  False       True   True   True
4    1  2  1    NaN   True  False      False  False  False
5    1  2  0    8.0  False   True       True   True   True
6    1  2  1    5.0   True  False      False  False  False
7    1  2  0    7.0  False   True       True   True   True
8    1  2  1    NaN   True  False       True   True  False
9    1  2  1    NaN   True   True      False   True  False
10   1  2  0    8.0  False   True      False   True   True
11   2  1  0    9.0  False  False      False  False  False
12   2  1  0    9.0  False  False      False  False  False
13   2  1  0   10.0  False  False       True   True   True
14   2  1  1    NaN   True  False      False  False  False
15   2  1  0    7.0  False   True       True   True   True
16   2  1  1    NaN   True  False      False  False  False

英文:

You can use a simple double groupby.shift for boolean indexing:

m = df[&#39;L&#39;].eq(1)
g = m.groupby([df[&#39;ID&#39;], df[&#39;P&#39;]])
m2 = (g.shift(fill_value=False) | g.shift(-1, fill_value=False))
out = df[m2&amp;~m]

Output:

    ID  P  L  Score
0    1  1  0    5.0
2    1  1  0    7.0
3    1  2  0   10.0
5    1  2  0    8.0
7    1  2  0    7.0
10   1  2  0    8.0
13   2  1  0   10.0
15   2  1  0    7.0

Intermediates:

    ID  P  L  Score      m  shift  shift(-1)     m2  m2&amp;~m
0    1  1  0    5.0  False  False       True   True   True
1    1  1  1    NaN   True  False      False  False  False
2    1  1  0    7.0  False   True      False   True   True
3    1  2  0   10.0  False  False       True   True   True
4    1  2  1    NaN   True  False      False  False  False
5    1  2  0    8.0  False   True       True   True   True
6    1  2  1    5.0   True  False      False  False  False
7    1  2  0    7.0  False   True       True   True   True
8    1  2  1    NaN   True  False       True   True  False
9    1  2  1    NaN   True   True      False   True  False
10   1  2  0    8.0  False   True      False   True   True
11   2  1  0    9.0  False  False      False  False  False
12   2  1  0    9.0  False  False      False  False  False
13   2  1  0   10.0  False  False       True   True   True
14   2  1  1    NaN   True  False      False  False  False
15   2  1  0    7.0  False   True       True   True   True
16   2  1  1    NaN   True  False      False  False  False

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

识别基于条件的行，并选择上方一行和下方一行。

问题

答案1

答案2

如何退出Python的TCP接受函数？

将二进制转换为其原始格式

训练 VGG16 从头开始在 Keras 中不会提高准确性。

移除列表中相邻的元素。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。