pandas根据列和其他条件选择行和单元格的值

huangapple go评论64阅读模式
英文:

pandas select rows and cell values based on column and other conditions

问题

在pandas中如何执行以下操作-

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
    'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})

忽略ColC中第一个和最后一个X之前的所有行

查找ColC中第一个X的出现位置,选择其下一行,从该选定行中获取ColA的值(例如,Y),在df中它是11111

查找ColC中第二个X的出现位置,从该行中获取ColB的值(例如,Z),在df中它是33333

在第二个X的下一个位置的ColD中写入结果值(Z-Y)(应为33333-11111 = 22222)

对第二个和第三个X,第三个和第四个X等一对一对X的出现位置重复此操作,直到df的末尾。

在ColD中的预期结果。

英文:

How to do the following in pandas-

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
    'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})

Disregard all rows before 1st and after last occurrence of X in ColC

search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)

search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)

write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)

Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.

Expected result in ColD.

答案1

得分: 1

使用一些 shift 功能和布尔索引

m = df['ColC'].eq('X')

df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

或者,如果你想要0:

m = df['ColC'].eq('X')

df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
                .fillna(0, downcast='infer')
             )

输出:

     ColA   ColB ColC   ColD
0       1     11    A      0
1       2      5    B      0
2       3     22    X      0
3   11111     66    C      0
4       5      4    D      0
5       6  33333    X  22222
6       7     45    E      0
7       8     91    F      0
8       9     78    G      0
9       1     10    A      0
10      2     17    B      0
11      3     55    X     48
12      4     73    C      0
13      5     85    D      0
14      6     56    X     52
15      7     99    E      0
16      8      4    F      0
17      9     74    G      0

中间步骤:

# 获取前一个 X 后面的行的值
df['ColA'].shift(-1)[m].shift()

2         NaN
5     11111.0
11        7.0
14        4.0
Name: ColA, dtype: float64
英文:

Using some shift-fu and boolean indexing:

m = df['ColC'].eq('X')

df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

Or, if you want 0s:

m = df['ColC'].eq('X')

df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
                .fillna(0, downcast='infer')
             )

Output:

     ColA   ColB ColC   ColD
0       1     11    A      0
1       2      5    B      0
2       3     22    X      0
3   11111     66    C      0
4       5      4    D      0
5       6  33333    X  22222
6       7     45    E      0
7       8     91    F      0
8       9     78    G      0
9       1     10    A      0
10      2     17    B      0
11      3     55    X     48
12      4     73    C      0
13      5     85    D      0
14      6     56    X     52
15      7     99    E      0
16      8      4    F      0
17      9     74    G      0

Intermediate:

# get the value of the row following the previous X
df['ColA'].shift(-1)[m].shift()

2         NaN
5     11111.0
11        7.0
14        4.0
Name: ColA, dtype: float64

答案2

得分: 0

这段代码对你的解决方案是否有用?我还附上了输出屏幕截图,以便你清楚地了解它的工作原理。

import pandas as pd

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
})

# 查找ColC中'X'的第一次和最后一次出现的索引
first_appear = df['ColC'].eq('X').idxmax()
last_appear = df['ColC'].eq('X')[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df['ColD'] = final_df['ColB'] - final_df['ColA']
print(final_df)

pandas根据列和其他条件选择行和单元格的值

英文:

Might this code be useful for your solution? I also attached the output ss so that you have a clear idea about the working.

 import pandas as pd

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
})

# Find the indices of the first and last occurrence of 'X' in ColC
first_appear = df['ColC'].eq('X').idxmax()
last_appear = df['ColC'].eq('X')[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df['ColD'] = final_df['ColB'] - final_df['ColA']
print(final_df)

pandas根据列和其他条件选择行和单元格的值

huangapple
  • 本文由 发表于 2023年7月7日 01:56:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631421.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定