如何在 pandas 数据框中达到阈值时计算面积总和?

huangapple go评论74阅读模式

How to sum area if a threshold is reached in pandas dataframe?



  1. lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
  2. lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
  1. doy ST ... area left
  2. 0 111 AK ... 4.293174e+05 760964.996900
  3. 1 120 AK ... 4.722491e+06 760535.679500
  4. 2 121 AK ... 8.586347e+06 760149.293900
  5. 3 122 AK ... 2.683233e+07 758324.695200
  6. 4 122 AK ... 2.962290e+07 758045.638900
  7. .. ... ... ... ... ...
  8. 111 211 AK ... 7.609006e+09 107.329336
  9. 112 212 AK ... 7.609221e+09 85.863469
  10. 113 213 AK ... 7.609435e+09 64.397602
  11. 114 214 AK ... 7.609650e+09 42.931735
  12. 115 215 AK ... 7.610079e+09 0.000000


  1. area ST
  2. 5.0000+05 CA
  3. 4.0123+05 MA
  4. 3.1941+05 TX
  5. 4.0011+05 FL
  6. 1.2346+05 OH
  7. 87.318+05 WY
  8. 0.7133+05 AK



I have a pandas data frame df where I try to find the sum of hectares that need to be harvested area before the threshold day in the other pandas data frame lst is reached per state.

  1. lst = pd.DataFrame()
  2. lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
  3. lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
  1. print(df)
  2. doy ST ... area left
  3. 0 111 AK ... 4.293174e+05 760964.996900
  4. 1 120 AK ... 4.722491e+06 760535.679500
  5. 2 121 AK ... 8.586347e+06 760149.293900
  6. 3 122 AK ... 2.683233e+07 758324.695200
  7. 4 122 AK ... 2.962290e+07 758045.638900
  8. .. ... ... ... ... ...
  9. 111 211 AK ... 7.609006e+09 107.329336
  10. 112 212 AK ... 7.609221e+09 85.863469
  11. 113 213 AK ... 7.609435e+09 64.397602
  12. 114 214 AK ... 7.609650e+09 42.931735
  13. 115 215 AK ... 7.610079e+09 0.000000

So I would end up with a data frame that sums up all the area before the threshold doy in lst

  1. area ST
  2. 5.0000+05 CA
  3. 4.0123+05 MA
  4. 3.1941+05 TX
  5. 4.0011+05 FL
  6. 1.2346+05 OH
  7. 87.318+05 WY
  8. 0.7133+05 AK

How can I achieve this?


得分: 1

你可以通过使用Serieslst映射ST列,并比较是否小于df['doy']列,然后在boolean indexing中进行过滤并聚合sum

  1. df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
  2. .groupby('ST', as_index=False)['area'].sum()[['area', 'ST']])
  3. print(df1)
  4. area ST
  5. 0 70193385.4 AK

You can mapping ST column by Series from lst and compare if less like df['doy'] column, filter in boolean indexing and aggregate sum:

  1. df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
  2. .groupby('ST', as_index=False)['area'].sum()[['area','ST']])
  3. print (df1)
  4. area ST
  5. 0 70193385.4 AK


得分: 0



  1. doy_threshold = 108
  2. df[df['doy'] < doy_threshold].groupby(by=["ST"]).sum()

If I understood you, you should filter the df by doy and the group by ST and sum.

Here is an example with doy before 108:

  1. doy_threshold = 108
  2. df[df[&#39;doy&#39;]&lt;doy_threshold].groupby(by=[&quot;ST&quot;]).sum()

  • 本文由 发表于 2023年2月16日 17:34:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75470298.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
