如何在 pandas 数据框中达到阈值时计算面积总和?

huangapple go评论74阅读模式
英文:

How to sum area if a threshold is reached in pandas dataframe?

问题

我有一个名为df的pandas数据框,我想要在另一个名为lst的pandas数据框中达到的阈值日期之前找到需要收获的公顷数area的总和,每个州一次。

  1. lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
  2. lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
  1. doy ST ... area left
  2. 0 111 AK ... 4.293174e+05 760964.996900
  3. 1 120 AK ... 4.722491e+06 760535.679500
  4. 2 121 AK ... 8.586347e+06 760149.293900
  5. 3 122 AK ... 2.683233e+07 758324.695200
  6. 4 122 AK ... 2.962290e+07 758045.638900
  7. .. ... ... ... ... ...
  8. 111 211 AK ... 7.609006e+09 107.329336
  9. 112 212 AK ... 7.609221e+09 85.863469
  10. 113 213 AK ... 7.609435e+09 64.397602
  11. 114 214 AK ... 7.609650e+09 42.931735
  12. 115 215 AK ... 7.610079e+09 0.000000

因此,我想要一个数据框,它在lst中达到阈值doy之前总结所有area

  1. area ST
  2. 5.0000+05 CA
  3. 4.0123+05 MA
  4. 3.1941+05 TX
  5. 4.0011+05 FL
  6. 1.2346+05 OH
  7. 87.318+05 WY
  8. 0.7133+05 AK

如何实现这个目标?

英文:

I have a pandas data frame df where I try to find the sum of hectares that need to be harvested area before the threshold day in the other pandas data frame lst is reached per state.

  1. lst = pd.DataFrame()
  2. lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
  3. lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
  1. print(df)
  2. doy ST ... area left
  3. 0 111 AK ... 4.293174e+05 760964.996900
  4. 1 120 AK ... 4.722491e+06 760535.679500
  5. 2 121 AK ... 8.586347e+06 760149.293900
  6. 3 122 AK ... 2.683233e+07 758324.695200
  7. 4 122 AK ... 2.962290e+07 758045.638900
  8. .. ... ... ... ... ...
  9. 111 211 AK ... 7.609006e+09 107.329336
  10. 112 212 AK ... 7.609221e+09 85.863469
  11. 113 213 AK ... 7.609435e+09 64.397602
  12. 114 214 AK ... 7.609650e+09 42.931735
  13. 115 215 AK ... 7.610079e+09 0.000000

So I would end up with a data frame that sums up all the area before the threshold doy in lst

  1. area ST
  2. 5.0000+05 CA
  3. 4.0123+05 MA
  4. 3.1941+05 TX
  5. 4.0011+05 FL
  6. 1.2346+05 OH
  7. 87.318+05 WY
  8. 0.7133+05 AK

How can I achieve this?

答案1

得分: 1

你可以通过使用Serieslst映射ST列,并比较是否小于df['doy']列,然后在boolean indexing中进行过滤并聚合sum

  1. df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
  2. .groupby('ST', as_index=False)['area'].sum()[['area', 'ST']])
  3. print(df1)
  4. area ST
  5. 0 70193385.4 AK
英文:

You can mapping ST column by Series from lst and compare if less like df['doy'] column, filter in boolean indexing and aggregate sum:

  1. df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
  2. .groupby('ST', as_index=False)['area'].sum()[['area','ST']])
  3. print (df1)
  4. area ST
  5. 0 70193385.4 AK

答案2

得分: 0

如果我理解您的意思,您应该按照doy过滤df,然后按ST分组并求和。

这是一个在doy小于108之前的示例:

  1. doy_threshold = 108
  2. df[df['doy'] < doy_threshold].groupby(by=["ST"]).sum()
英文:

If I understood you, you should filter the df by doy and the group by ST and sum.

Here is an example with doy before 108:

  1. doy_threshold = 108
  2. df[df[&#39;doy&#39;]&lt;doy_threshold].groupby(by=[&quot;ST&quot;]).sum()

huangapple
  • 本文由 发表于 2023年2月16日 17:34:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75470298.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定