2023年7月7日 01:56:14go评论77阅读模式

英文:

pandas select rows and cell values based on column and other conditions

问题

在pandas中如何执行以下操作-

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
    'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})

忽略ColC中第一个和最后一个X之前的所有行

查找ColC中第一个X的出现位置，选择其下一行，从该选定行中获取ColA的值（例如，Y），在df中它是11111

查找ColC中第二个X的出现位置，从该行中获取ColB的值（例如，Z），在df中它是33333

在第二个X的下一个位置的ColD中写入结果值（Z-Y）（应为33333-11111 = 22222）

对第二个和第三个X，第三个和第四个X等一对一对X的出现位置重复此操作，直到df的末尾。

在ColD中的预期结果。

英文:

How to do the following in pandas-

df = pd.DataFrame({
    &#39;ColA&#39;: [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    &#39;ColB&#39;: [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    &#39;ColC&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;X&#39;, &#39;C&#39;, &#39;D&#39;, &#39;X&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;, &#39;A&#39;, &#39;B&#39;, &#39;X&#39;, &#39;C&#39;, &#39;D&#39;, &#39;X&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;],
    &#39;ColD&#39;: [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})

Disregard all rows before 1st and after last occurrence of X in ColC

search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)

search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)

write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)

Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.

Expected result in ColD.

答案1

得分: 1

使用一些 shift 功能和布尔索引：

m = df['ColC'].eq('X')

df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

或者，如果你想要0：

m = df['ColC'].eq('X')

df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
                .fillna(0, downcast='infer')
             )

输出：

     ColA   ColB ColC   ColD
0       1     11    A      0
1       2      5    B      0
2       3     22    X      0
3   11111     66    C      0
4       5      4    D      0
5       6  33333    X  22222
6       7     45    E      0
7       8     91    F      0
8       9     78    G      0
9       1     10    A      0
10      2     17    B      0
11      3     55    X     48
12      4     73    C      0
13      5     85    D      0
14      6     56    X     52
15      7     99    E      0
16      8      4    F      0
17      9     74    G      0

中间步骤：

# 获取前一个 X 后面的行的值
df['ColA'].shift(-1)[m].shift()

2         NaN
5     11111.0
11        7.0
14        4.0
Name: ColA, dtype: float64

英文:

Using some shift-fu and boolean indexing:

m = df[&#39;ColC&#39;].eq(&#39;X&#39;)

df[&#39;ColD&#39;] = df.loc[m, &#39;ColB&#39;] - df[&#39;ColA&#39;].shift(-1)[m].shift()

Or, if you want 0s:

m = df[&#39;ColC&#39;].eq(&#39;X&#39;)

df[&#39;ColD&#39;] = (df[&#39;ColB&#39;].sub(df[&#39;ColA&#39;].shift(-1)[m].shift())
                .fillna(0, downcast=&#39;infer&#39;)
             )

Output:

     ColA   ColB ColC   ColD
0       1     11    A      0
1       2      5    B      0
2       3     22    X      0
3   11111     66    C      0
4       5      4    D      0
5       6  33333    X  22222
6       7     45    E      0
7       8     91    F      0
8       9     78    G      0
9       1     10    A      0
10      2     17    B      0
11      3     55    X     48
12      4     73    C      0
13      5     85    D      0
14      6     56    X     52
15      7     99    E      0
16      8      4    F      0
17      9     74    G      0

Intermediate:

# get the value of the row following the previous X
df[&#39;ColA&#39;].shift(-1)[m].shift()

2         NaN
5     11111.0
11        7.0
14        4.0
Name: ColA, dtype: float64

答案2

得分: 0

这段代码对你的解决方案是否有用？我还附上了输出屏幕截图，以便你清楚地了解它的工作原理。

import pandas as pd

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
})

# 查找ColC中'X'的第一次和最后一次出现的索引
first_appear = df['ColC'].eq('X').idxmax()
last_appear = df['ColC'].eq('X')[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df['ColD'] = final_df['ColB'] - final_df['ColA']
print(final_df)

英文:

Might this code be useful for your solution? I also attached the output ss so that you have a clear idea about the working.

 import pandas as pd

df = pd.DataFrame({
    &#39;ColA&#39;: [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    &#39;ColB&#39;: [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    &#39;ColC&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;X&#39;, &#39;C&#39;, &#39;D&#39;, &#39;X&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;, &#39;A&#39;, &#39;B&#39;, &#39;X&#39;, &#39;C&#39;, &#39;D&#39;, &#39;X&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;]
})

# Find the indices of the first and last occurrence of &#39;X&#39; in ColC
first_appear = df[&#39;ColC&#39;].eq(&#39;X&#39;).idxmax()
last_appear = df[&#39;ColC&#39;].eq(&#39;X&#39;)[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df[&#39;ColD&#39;] = final_df[&#39;ColB&#39;] - final_df[&#39;ColA&#39;]
print(final_df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas根据列和其他条件选择行和单元格的值

问题

答案1

答案2

在Pandas中创建假期布尔列

Python ColumnTransformer 设置 SettingWithCopyWarning

Vectorize the assignment of a column in a pandas dataframe where a custom index has many rows and the column value is set using all rows in the index

在列”T”中的”True”上方插入一行空行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论