pandas根据列和其他条件选择行和单元格的值

huangapple go评论101阅读模式
英文:

pandas select rows and cell values based on column and other conditions

问题

在pandas中如何执行以下操作-

  1. df = pd.DataFrame({
  2. 'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
  3. 'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
  4. 'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
  5. 'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
  6. })

忽略ColC中第一个和最后一个X之前的所有行

查找ColC中第一个X的出现位置,选择其下一行,从该选定行中获取ColA的值(例如,Y),在df中它是11111

查找ColC中第二个X的出现位置,从该行中获取ColB的值(例如,Z),在df中它是33333

在第二个X的下一个位置的ColD中写入结果值(Z-Y)(应为33333-11111 = 22222)

对第二个和第三个X,第三个和第四个X等一对一对X的出现位置重复此操作,直到df的末尾。

在ColD中的预期结果。

英文:

How to do the following in pandas-

  1. df = pd.DataFrame({
  2. 'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
  3. 'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
  4. 'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
  5. 'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
  6. })

Disregard all rows before 1st and after last occurrence of X in ColC

search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)

search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)

write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)

Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.

Expected result in ColD.

答案1

得分: 1

使用一些 shift 功能和布尔索引

  1. m = df['ColC'].eq('X')
  2. df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

或者,如果你想要0:

  1. m = df['ColC'].eq('X')
  2. df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
  3. .fillna(0, downcast='infer')
  4. )

输出:

  1. ColA ColB ColC ColD
  2. 0 1 11 A 0
  3. 1 2 5 B 0
  4. 2 3 22 X 0
  5. 3 11111 66 C 0
  6. 4 5 4 D 0
  7. 5 6 33333 X 22222
  8. 6 7 45 E 0
  9. 7 8 91 F 0
  10. 8 9 78 G 0
  11. 9 1 10 A 0
  12. 10 2 17 B 0
  13. 11 3 55 X 48
  14. 12 4 73 C 0
  15. 13 5 85 D 0
  16. 14 6 56 X 52
  17. 15 7 99 E 0
  18. 16 8 4 F 0
  19. 17 9 74 G 0

中间步骤:

  1. # 获取前一个 X 后面的行的值
  2. df['ColA'].shift(-1)[m].shift()
  3. 2 NaN
  4. 5 11111.0
  5. 11 7.0
  6. 14 4.0
  7. Name: ColA, dtype: float64
英文:

Using some shift-fu and boolean indexing:

  1. m = df['ColC'].eq('X')
  2. df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

Or, if you want 0s:

  1. m = df['ColC'].eq('X')
  2. df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
  3. .fillna(0, downcast='infer')
  4. )

Output:

  1. ColA ColB ColC ColD
  2. 0 1 11 A 0
  3. 1 2 5 B 0
  4. 2 3 22 X 0
  5. 3 11111 66 C 0
  6. 4 5 4 D 0
  7. 5 6 33333 X 22222
  8. 6 7 45 E 0
  9. 7 8 91 F 0
  10. 8 9 78 G 0
  11. 9 1 10 A 0
  12. 10 2 17 B 0
  13. 11 3 55 X 48
  14. 12 4 73 C 0
  15. 13 5 85 D 0
  16. 14 6 56 X 52
  17. 15 7 99 E 0
  18. 16 8 4 F 0
  19. 17 9 74 G 0

Intermediate:

  1. # get the value of the row following the previous X
  2. df['ColA'].shift(-1)[m].shift()
  3. 2 NaN
  4. 5 11111.0
  5. 11 7.0
  6. 14 4.0
  7. Name: ColA, dtype: float64

答案2

得分: 0

这段代码对你的解决方案是否有用?我还附上了输出屏幕截图,以便你清楚地了解它的工作原理。

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
  4. 'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
  5. 'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
  6. })
  7. # 查找ColC中'X'的第一次和最后一次出现的索引
  8. first_appear = df['ColC'].eq('X').idxmax()
  9. last_appear = df['ColC'].eq('X')[::-1].idxmax()
  10. final_df = df.loc[first_appear+1:last_appear]
  11. final_df['ColD'] = final_df['ColB'] - final_df['ColA']
  12. print(final_df)

pandas根据列和其他条件选择行和单元格的值

英文:

Might this code be useful for your solution? I also attached the output ss so that you have a clear idea about the working.

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
  4. 'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
  5. 'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G']
  6. })
  7. # Find the indices of the first and last occurrence of 'X' in ColC
  8. first_appear = df['ColC'].eq('X').idxmax()
  9. last_appear = df['ColC'].eq('X')[::-1].idxmax()
  10. final_df = df.loc[first_appear+1:last_appear]
  11. final_df['ColD'] = final_df['ColB'] - final_df['ColA']
  12. print(final_df)

pandas根据列和其他条件选择行和单元格的值

huangapple
  • 本文由 发表于 2023年7月7日 01:56:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631421.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定