如何在 pandas 数据框中达到阈值时计算面积总和?

huangapple go评论59阅读模式
英文:

How to sum area if a threshold is reached in pandas dataframe?

问题

我有一个名为df的pandas数据框,我想要在另一个名为lst的pandas数据框中达到的阈值日期之前找到需要收获的公顷数area的总和,每个州一次。

lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
            doy  ST  ...          area  left
0           111  AK  ...  4.293174e+05  760964.996900
1           120  AK  ...  4.722491e+06  760535.679500
2           121  AK  ...  8.586347e+06  760149.293900
3           122  AK  ...  2.683233e+07  758324.695200
4           122  AK  ...  2.962290e+07  758045.638900
..          ... ...  ...           ...            ...
111         211  AK  ...  7.609006e+09     107.329336
112         212  AK  ...  7.609221e+09      85.863469
113         213  AK  ...  7.609435e+09      64.397602
114         214  AK  ...  7.609650e+09      42.931735
115         215  AK  ...  7.610079e+09       0.000000

因此,我想要一个数据框,它在lst中达到阈值doy之前总结所有area

     area       ST
   5.0000+05      CA
   4.0123+05      MA
   3.1941+05      TX
   4.0011+05      FL
   1.2346+05      OH
   87.318+05      WY
   0.7133+05      AK

如何实现这个目标?

英文:

I have a pandas data frame df where I try to find the sum of hectares that need to be harvested area before the threshold day in the other pandas data frame lst is reached per state.

lst = pd.DataFrame()
lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
print(df)
            doy  ST  ...          area  left
0           111  AK  ...  4.293174e+05  760964.996900
1           120  AK  ...  4.722491e+06  760535.679500
2           121  AK  ...  8.586347e+06  760149.293900
3           122  AK  ...  2.683233e+07  758324.695200
4           122  AK  ...  2.962290e+07  758045.638900
..          ... ...  ...           ...            ...
111         211  AK  ...  7.609006e+09     107.329336
112         212  AK  ...  7.609221e+09      85.863469
113         213  AK  ...  7.609435e+09      64.397602
114         214  AK  ...  7.609650e+09      42.931735
115         215  AK  ...  7.610079e+09       0.000000

So I would end up with a data frame that sums up all the area before the threshold doy in lst

     area       ST
   5.0000+05      CA
   4.0123+05      MA
   3.1941+05      TX
   4.0011+05      FL
   1.2346+05      OH
   87.318+05      WY
   0.7133+05      AK

How can I achieve this?

答案1

得分: 1

你可以通过使用Serieslst映射ST列,并比较是否小于df['doy']列,然后在boolean indexing中进行过滤并聚合sum

df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
        .groupby('ST', as_index=False)['area'].sum()[['area', 'ST']])
print(df1)
         area  ST
0  70193385.4  AK
英文:

You can mapping ST column by Series from lst and compare if less like df['doy'] column, filter in boolean indexing and aggregate sum:

df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
        .groupby('ST', as_index=False)['area'].sum()[['area','ST']])
print (df1)
         area  ST
0  70193385.4  AK

答案2

得分: 0

如果我理解您的意思,您应该按照doy过滤df,然后按ST分组并求和。

这是一个在doy小于108之前的示例:

doy_threshold = 108
df[df['doy'] < doy_threshold].groupby(by=["ST"]).sum()
英文:

If I understood you, you should filter the df by doy and the group by ST and sum.

Here is an example with doy before 108:

doy_threshold = 108
df[df[&#39;doy&#39;]&lt;doy_threshold].groupby(by=[&quot;ST&quot;]).sum()

huangapple
  • 本文由 发表于 2023年2月16日 17:34:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75470298.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定