英文:
How to sum area if a threshold is reached in pandas dataframe?
问题
我有一个名为df
的pandas数据框,我想要在另一个名为lst
的pandas数据框中达到的阈值日期之前找到需要收获的公顷数area
的总和,每个州一次。
lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
doy ST ... area left
0 111 AK ... 4.293174e+05 760964.996900
1 120 AK ... 4.722491e+06 760535.679500
2 121 AK ... 8.586347e+06 760149.293900
3 122 AK ... 2.683233e+07 758324.695200
4 122 AK ... 2.962290e+07 758045.638900
.. ... ... ... ... ...
111 211 AK ... 7.609006e+09 107.329336
112 212 AK ... 7.609221e+09 85.863469
113 213 AK ... 7.609435e+09 64.397602
114 214 AK ... 7.609650e+09 42.931735
115 215 AK ... 7.610079e+09 0.000000
因此,我想要一个数据框,它在lst
中达到阈值doy
之前总结所有area
。
area ST
5.0000+05 CA
4.0123+05 MA
3.1941+05 TX
4.0011+05 FL
1.2346+05 OH
87.318+05 WY
0.7133+05 AK
如何实现这个目标?
英文:
I have a pandas data frame df
where I try to find the sum of hectares that need to be harvested area
before the threshold day in the other pandas data frame lst
is reached per state.
lst = pd.DataFrame()
lst['ST'] = ['CA', 'MA', 'TX', 'FL', 'OH', 'WY', 'AK']
lst['doy'] = [140, 150, 160, 170, 180, 190, 200]
print(df)
doy ST ... area left
0 111 AK ... 4.293174e+05 760964.996900
1 120 AK ... 4.722491e+06 760535.679500
2 121 AK ... 8.586347e+06 760149.293900
3 122 AK ... 2.683233e+07 758324.695200
4 122 AK ... 2.962290e+07 758045.638900
.. ... ... ... ... ...
111 211 AK ... 7.609006e+09 107.329336
112 212 AK ... 7.609221e+09 85.863469
113 213 AK ... 7.609435e+09 64.397602
114 214 AK ... 7.609650e+09 42.931735
115 215 AK ... 7.610079e+09 0.000000
So I would end up with a data frame that sums up all the area
before the threshold doy
in lst
area ST
5.0000+05 CA
4.0123+05 MA
3.1941+05 TX
4.0011+05 FL
1.2346+05 OH
87.318+05 WY
0.7133+05 AK
How can I achieve this?
答案1
得分: 1
你可以通过使用Series
从lst
映射ST
列,并比较是否小于df['doy']
列,然后在boolean indexing
中进行过滤并聚合sum
:
df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
.groupby('ST', as_index=False)['area'].sum()[['area', 'ST']])
print(df1)
area ST
0 70193385.4 AK
英文:
You can mapping ST
column by Series
from lst
and compare if less like df['doy']
column, filter in boolean indexing
and aggregate sum
:
df1 = (df[df['doy'].lt(df['ST'].map(lst.set_index('ST')['doy']))]
.groupby('ST', as_index=False)['area'].sum()[['area','ST']])
print (df1)
area ST
0 70193385.4 AK
答案2
得分: 0
如果我理解您的意思,您应该按照doy过滤df,然后按ST分组并求和。
这是一个在doy小于108之前的示例:
doy_threshold = 108
df[df['doy'] < doy_threshold].groupby(by=["ST"]).sum()
英文:
If I understood you, you should filter the df by doy and the group by ST and sum.
Here is an example with doy before 108:
doy_threshold = 108
df[df['doy']<doy_threshold].groupby(by=["ST"]).sum()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论