循环遍历数据框以限制和汇总另一个数据框

huangapple go评论141阅读模式
英文:

Looping over data frame to cap and sum another data frame

问题

  1. # 代码:
  2. import numpy as np
  3. import pandas as pd
  4. df1 = pd.DataFrame({'Caps':['25','50','100']})
  5. df2 = pd.DataFrame({'Amounts':['45','25','65','35','85','105','80'],
  6. 'Type': ['a' ,'b' ,'b' ,'c' ,'a' , 'b' ,'d' ]})
  7. df3 = pd.DataFrame({'Type': ['a' ,'b' ,'c' ,'d']})
  8. df1['Caps'] = df1['Caps'].astype(float)
  9. df2['Amounts'] = df2['Amounts'].astype(float)
  10. for index1, row1 in df1.iterrows():
  11. for index2, row2 in df3.iterrows():
  12. df3[str(int(row1['Caps']))+'limit'] = df2['Amounts'].where(
  13. df2['Type'] == row2['Type']).where(
  14. df2['Amounts'] <= row1['Caps'], row1['Caps']).sum()
  15. # 期望的输出应该是这样的:
  16. df3 = pd.DataFrame({'Type':['a','b','c','d'],
  17. 'Total':['130','195','35','80'],
  18. '25limit':['50','75','25','25'],
  19. '50limit':['95','125','35','50'],
  20. '100limit':['130','190','35','80'],
  21. })
  22. # 输出:
  23. df3

输出结果:

  1. Type Total 25limit 50limit 100limit
  2. 0 a 130 50 95 130
  3. 1 b 195 75 125 190
  4. 2 c 35 25 35 35
  5. 3 d 80 25 50 80
英文:

I am trying to use entries from df1 to limit amounts in df2, then add them up based on their type and summarize in df3. I'm not sure how to get it, the for loop using iterrows would be my best guess but it's not complete.

Code:

  1. import numpy as np
  2. import pandas as pd
  3. df1 = pd.DataFrame({&#39;Caps&#39;:[&#39;25&#39;,&#39;50&#39;,&#39;100&#39;]})
  4. df2 = pd.DataFrame({&#39;Amounts&#39;:[&#39;45&#39;,&#39;25&#39;,&#39;65&#39;,&#39;35&#39;,&#39;85&#39;,&#39;105&#39;,&#39;80&#39;], \
  5. &#39;Type&#39;: [&#39;a&#39; ,&#39;b&#39; ,&#39;b&#39; ,&#39;c&#39; ,&#39;a&#39; , &#39;b&#39; ,&#39;d&#39; ]})
  6. df3 = pd.DataFrame({&#39;Type&#39;: [&#39;a&#39; ,&#39;b&#39; ,&#39;c&#39; ,&#39;d&#39;]})
  7. df1[&#39;Caps&#39;] = df1[&#39;Caps&#39;].astype(float)
  8. df2[&#39;Amounts&#39;] = df2[&#39;Amounts&#39;].astype(float)
  9. for index1, row1 in df1.iterrows():
  10. for index2, row2 in df3.iterrows():
  11. df3[str(row1[&#39;Caps&#39;]+&#39;limit&#39;)] = df2[&#39;Amounts&#39;].where(
  12. df2[&#39;Type&#39;] == row2[&#39;Type&#39;]).where(
  13. df2[&#39;Amounts&#39;]&lt;= row1[&#39;Caps&#39;], row1[&#39;Caps&#39;]).sum()
  14. # My ideal output would be this:
  15. df3 = pd.DataFrame({&#39;Type&#39;:[&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;d&#39;],
  16. &#39;Total&#39;:[&#39;130&#39;,&#39;195&#39;,&#39;35&#39;,&#39;80&#39;],
  17. &#39;25limit&#39;:[&#39;50&#39;,&#39;75&#39;,&#39;25&#39;,&#39;25&#39;],
  18. &#39;50limit&#39;:[&#39;95&#39;,&#39;125&#39;,&#39;35&#39;,&#39;50&#39;],
  19. &#39;100limit&#39;:[&#39;130&#39;,&#39;190&#39;,&#39;35&#39;,&#39;80&#39;],
  20. })

Output:

  1. &gt;&gt;&gt; df3
  2. Type Total 25limit 50limit 100limit
  3. 0 a 130 50 95 130
  4. 1 b 195 75 125 190
  5. 2 c 35 25 35 35
  6. 3 d 80 25 50 80

答案1

得分: 3

使用numpy来比较所有的Amounts值与Caps值,通过广播到2D数组a,然后使用构造函数创建DataFrame,每列求和,然后转置使用DataFrame.TDataFrame.add_prefix

对于聚合列,使用DataFrame.insert来插入第一列,使用GroupBy.sum

  1. df1['Caps'] = df1['Caps'].astype(int)
  2. df2['Amounts'] = df2['Amounts'].astype(int)
  3. am = df2['Amounts'].to_numpy()
  4. ca = df1['Caps'].to_numpy()
  5. #a = np.where(am &lt;= ca[:, None], am[None, :], ca[:, None])
  6. a = np.where(am <= ca[:, None], am[None, :], ca[:, None])
  7. df1 = (pd.DataFrame(a,columns=df2['Type'],index=df1['Caps'])
  8. .sum(axis=1, level=0).T.add_suffix('limit'))
  9. df1.insert(0, 'Total', df2.groupby('Type')['Amounts'].sum())
  10. df1 = df1.reset_index().rename_axis(None, axis=1)
  11. print (df1)
  12. Type Total 25limit 50limit 100limit
  13. 0 a 130 50 95 130
  14. 1 b 195 75 125 190
  15. 2 c 35 25 35 35
  16. 3 d 80 25 50 80

请注意,这是给定代码的翻译,其中包括代码注释。

英文:

Use numpy for compare all values Amounts with Caps by broadcasting to 2d array a, then create DataFrame by constructor with sum per columns, transpose by DataFrame.T and DataFrame.add_prefix.

For aggregated column use DataFrame.insert for first column with GroupBy.sum:

  1. df1[&#39;Caps&#39;] = df1[&#39;Caps&#39;].astype(int)
  2. df2[&#39;Amounts&#39;] = df2[&#39;Amounts&#39;].astype(int)
  3. am = df2[&#39;Amounts&#39;].to_numpy()
  4. ca = df1[&#39;Caps&#39;].to_numpy()
  5. #pandas below 0.24
  6. #am = df2[&#39;Amounts&#39;].values
  7. #ca = df1[&#39;Caps&#39;].values
  8. a = np.where(am &lt;= ca[:, None], am[None, :], ca[:, None])
  9. df1 = (pd.DataFrame(a,columns=df2[&#39;Type&#39;],index=df1[&#39;Caps&#39;])
  10. .sum(axis=1, level=0).T.add_suffix(&#39;limit&#39;))
  11. df1.insert(0, &#39;Total&#39;, df2.groupby(&#39;Type&#39;)[&#39;Amounts&#39;].sum())
  12. df1 = df1.reset_index().rename_axis(None, axis=1)
  13. print (df1)
  14. Type Total 25limit 50limit 100limit
  15. 0 a 130 50 95 130
  16. 1 b 195 75 125 190
  17. 2 c 35 25 35 35
  18. 3 d 80 25 50 80

答案2

得分: 0

以下是已翻译的内容:

这是我的解决方案,没有使用numpy,但比@jezrael的解决方案慢两倍,10.5毫秒对比5.07毫秒。

  1. limcols = df1.Caps.to_list()
  2. df2 = df2.reindex(columns=["Amounts", "Type"] + limcols)
  3. df2[limcols] = df2[limcols].transform(
  4. lambda sc: np.where(df2.Amounts.le(sc.name), df2.Amounts, sc.name))
  5. # Summations:
  6. g = df2.groupby("Type")
  7. df3 = g[limcols].sum()
  8. df3.insert(0, "Total", g.Amounts.sum())
  9. # Renaming columns:
  10. c_dic = {lim: f"{lim:.0f}limit" for lim in limcols}
  11. df3 = df3.rename(columns=c_dic).reset_index()
  12. # Cleanup:
  13. # df2 = df2.drop(columns=limcols)
英文:

Here is my solution without numpy, however it is two times slower than @jezrael's solution, 10.5ms vs. 5.07ms.

  1. limcols= df1.Caps.to_list()
  2. df2=df2.reindex(columns=[&quot;Amounts&quot;,&quot;Type&quot;]+limcols)
  3. df2[limcols]= df2[limcols].transform( \
  4. lambda sc: np.where(df2.Amounts.le(sc.name),df2.Amounts,sc.name))
  5. # Summations:
  6. g=df2.groupby(&quot;Type&quot;)
  7. df3= g[limcols].sum()
  8. df3.insert(0,&quot;Total&quot;, g.Amounts.sum())
  9. # Renaming columns:
  10. c_dic={ lim:f&quot;{lim:.0f}limit&quot; for lim in limcols}
  11. df3= df3.rename(columns=c_dic).reset_index()
  12. # Cleanup:
  13. #df2=df2.drop(columns=limcols)

huangapple
  • 本文由 发表于 2020年1月3日 15:14:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574571.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定