
huangapple go评论94阅读模式

Find the sum of values in rows of one column for where the other column has NAN in Pandas





  1. data = {
  2. 'A': [1, 1, None, None, 2, 5, None, None, 3, 4, 3, None, 5],
  3. 'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130]
  4. }



  1. result = df.groupby(df['A'].isnull().cumsum())['B'].sum().reset_index()
  2. df_result = pd.DataFrame({'C': result['Pumped']})
  3. df_result.loc[1:, 'C'] -= result.loc[0, 'Pumped']
  4. df.loc[~mask, 'C'] = df.loc[~mask, 'Pumped']
  5. valid_rows_after_nan = df['dWL'].notnull() & mask.shift(1).fillna(False)
  6. df.loc[valid_rows_after_nan, 'C'] = df_result
  7. print(df)


  1. data = {
  2. 'A': [1, 1, None, None, 2, 5, None, None, 3, 4, 3, None, 5],
  3. 'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130],
  4. 'C': [10, 20, None, None, 120, 60, None, None, 240, 100, 110, None, 5]
  5. }

I have a dataframe with columns A and B. Column A has non continuous data where some of the rows are NAN and B has continuous data. I would like to create a third column where for each set of A rows with NAN it will have the sum of values in those same rows in B + the next valid value in B.
All other values in C should be NAN for NAN in A AND the value of B for rows following a valid number in A.

  1. data = {
  2. 'A': [1, 1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
  3. 'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130]}

Everything works fine except for the rows where I need the sum of B + next valid value in B.
I use the following code. I have this code but is seems it's a mess by now.

  1. `result = df.groupby(df['A'].isnull().cumsum())['B'].sum().reset_index()
  2. df_result = pd.DataFrame({'C': result['Pumped']})
  3. df_result.loc[1:, 'C'] -= result.loc[0, 'Pumped']
  4. df.loc[~mask, 'C'] = df.loc[~mask, 'Pumped']
  5. valid_rows_after_nan = df['dWL'].notnull() & mask.shift(1).fillna(False)
  6. df.loc[valid_rows_after_nan, 'C'] = df_result
  7. print(df)`

I would like the output to look like this:

  1. `data = {
  2. 'A': [1, 1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
  3. 'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130],
  4. 'C': [10, 20, None, None, 120, 60, None, None, 240, 100, 110, None, 5]
  5. }


得分: 4


  1. # 识别非NA值并反转
  2. m = df.loc[::-1, 'A'].notna()
  3. # 对前面的NA进行分组求和,并在NA处进行掩码
  4. df['C'] = df.groupby(m.cumsum())['B'].transform('sum').where(m)


  1. A B C
  2. 0 1.0 10 10.0
  3. 1 1.0 20 20.0
  4. 2 NaN 30 NaN
  5. 3 NaN 40 NaN
  6. 4 2.0 50 120.0
  7. 5 5.0 60 60.0
  8. 6 NaN 70 NaN
  9. 7 NaN 80 NaN
  10. 8 3.0 90 240.0
  11. 9 4.0 100 100.0
  12. 10 3.0 110 110.0
  13. 11 NaN 120 NaN
  14. 12 5.0 130 250.0

A simple version using groupby.transform:

  1. # identify the non-NA and reverse
  2. m = df.loc[::-1, 'A'].notna()
  3. # group the preceding NA, sum, mask where NA
  4. df['C'] = df.groupby(m.cumsum())['B'].transform('sum').where(m)


  1. A B C
  2. 0 1.0 10 10.0
  3. 1 1.0 20 20.0
  4. 2 NaN 30 NaN
  5. 3 NaN 40 NaN
  6. 4 2.0 50 120.0
  7. 5 5.0 60 60.0
  8. 6 NaN 70 NaN
  9. 7 NaN 80 NaN
  10. 8 3.0 90 240.0
  11. 9 4.0 100 100.0
  12. 10 3.0 110 110.0
  13. 11 NaN 120 NaN
  14. 12 5.0 130 250.0

  • 本文由 发表于 2023年7月27日 15:38:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76777484.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
