英文:
pandas select rows and cell values based on column and other conditions
问题
在pandas中如何执行以下操作-
df = pd.DataFrame({
'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})
忽略ColC中第一个和最后一个X之前的所有行
查找ColC中第一个X的出现位置,选择其下一行,从该选定行中获取ColA的值(例如,Y),在df中它是11111
查找ColC中第二个X的出现位置,从该行中获取ColB的值(例如,Z),在df中它是33333
在第二个X的下一个位置的ColD中写入结果值(Z-Y)(应为33333-11111 = 22222)
对第二个和第三个X,第三个和第四个X等一对一对X的出现位置重复此操作,直到df的末尾。
在ColD中的预期结果。
英文:
How to do the following in pandas-
df = pd.DataFrame({
'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})
Disregard all rows before 1st and after last occurrence of X in ColC
search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)
search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)
write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)
Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.
Expected result in ColD.
答案1
得分: 1
m = df['ColC'].eq('X')
df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()
或者,如果你想要0:
m = df['ColC'].eq('X')
df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
.fillna(0, downcast='infer')
)
输出:
ColA ColB ColC ColD
0 1 11 A 0
1 2 5 B 0
2 3 22 X 0
3 11111 66 C 0
4 5 4 D 0
5 6 33333 X 22222
6 7 45 E 0
7 8 91 F 0
8 9 78 G 0
9 1 10 A 0
10 2 17 B 0
11 3 55 X 48
12 4 73 C 0
13 5 85 D 0
14 6 56 X 52
15 7 99 E 0
16 8 4 F 0
17 9 74 G 0
中间步骤:
# 获取前一个 X 后面的行的值
df['ColA'].shift(-1)[m].shift()
2 NaN
5 11111.0
11 7.0
14 4.0
Name: ColA, dtype: float64
英文:
Using some shift
-fu and boolean indexing:
m = df['ColC'].eq('X')
df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()
Or, if you want 0s:
m = df['ColC'].eq('X')
df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
.fillna(0, downcast='infer')
)
Output:
ColA ColB ColC ColD
0 1 11 A 0
1 2 5 B 0
2 3 22 X 0
3 11111 66 C 0
4 5 4 D 0
5 6 33333 X 22222
6 7 45 E 0
7 8 91 F 0
8 9 78 G 0
9 1 10 A 0
10 2 17 B 0
11 3 55 X 48
12 4 73 C 0
13 5 85 D 0
14 6 56 X 52
15 7 99 E 0
16 8 4 F 0
17 9 74 G 0
Intermediate:
# get the value of the row following the previous X
df['ColA'].shift(-1)[m].shift()
2 NaN
5 11111.0
11 7.0
14 4.0
Name: ColA, dtype: float64
答案2
得分: 0
这段代码对你的解决方案是否有用?我还附上了输出屏幕截图,以便你清楚地了解它的工作原理。
import pandas as pd
df = pd.DataFrame({
'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
})
# 查找ColC中'X'的第一次和最后一次出现的索引
first_appear = df['ColC'].eq('X').idxmax()
last_appear = df['ColC'].eq('X')[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df['ColD'] = final_df['ColB'] - final_df['ColA']
print(final_df)
英文:
Might this code be useful for your solution? I also attached the output ss so that you have a clear idea about the working.
import pandas as pd
df = pd.DataFrame({
'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
})
# Find the indices of the first and last occurrence of 'X' in ColC
first_appear = df['ColC'].eq('X').idxmax()
last_appear = df['ColC'].eq('X')[::-1].idxmax()
final_df = df.loc[first_appear+1:last_appear]
final_df['ColD'] = final_df['ColB'] - final_df['ColA']
print(final_df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论