循环遍历数据框以限制和汇总另一个数据框

huangapple go评论98阅读模式
英文:

Looping over data frame to cap and sum another data frame

问题

# 代码:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'Caps':['25','50','100']})
df2 = pd.DataFrame({'Amounts':['45','25','65','35','85','105','80'],
                    'Type':   ['a' ,'b' ,'b' ,'c' ,'a' , 'b' ,'d' ]})
df3 = pd.DataFrame({'Type':   ['a' ,'b' ,'c' ,'d']})

df1['Caps'] = df1['Caps'].astype(float)
df2['Amounts'] = df2['Amounts'].astype(float)

for index1, row1 in df1.iterrows():
    for index2, row2 in df3.iterrows():
        df3[str(int(row1['Caps']))+'limit'] = df2['Amounts'].where(
            df2['Type'] == row2['Type']).where(
            df2['Amounts'] <= row1['Caps'], row1['Caps']).sum()


# 期望的输出应该是这样的:

df3 = pd.DataFrame({'Type':['a','b','c','d'],
                    'Total':['130','195','35','80'],
                    '25limit':['50','75','25','25'],
                    '50limit':['95','125','35','50'],
                    '100limit':['130','190','35','80'],
                    })

# 输出:
df3

输出结果:

  Type Total 25limit 50limit 100limit
0    a   130      50      95      130
1    b   195      75     125      190
2    c    35      25      35       35
3    d    80      25      50       80
英文:

I am trying to use entries from df1 to limit amounts in df2, then add them up based on their type and summarize in df3. I'm not sure how to get it, the for loop using iterrows would be my best guess but it's not complete.

Code:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({&#39;Caps&#39;:[&#39;25&#39;,&#39;50&#39;,&#39;100&#39;]})
df2 = pd.DataFrame({&#39;Amounts&#39;:[&#39;45&#39;,&#39;25&#39;,&#39;65&#39;,&#39;35&#39;,&#39;85&#39;,&#39;105&#39;,&#39;80&#39;], \
                    &#39;Type&#39;:   [&#39;a&#39; ,&#39;b&#39; ,&#39;b&#39; ,&#39;c&#39; ,&#39;a&#39; , &#39;b&#39; ,&#39;d&#39; ]})
df3 = pd.DataFrame({&#39;Type&#39;:   [&#39;a&#39; ,&#39;b&#39; ,&#39;c&#39; ,&#39;d&#39;]})

df1[&#39;Caps&#39;] = df1[&#39;Caps&#39;].astype(float)
df2[&#39;Amounts&#39;] = df2[&#39;Amounts&#39;].astype(float)

for index1, row1 in df1.iterrows():
    for index2, row2 in df3.iterrows():
        df3[str(row1[&#39;Caps&#39;]+&#39;limit&#39;)] = df2[&#39;Amounts&#39;].where(
            df2[&#39;Type&#39;] == row2[&#39;Type&#39;]).where(
            df2[&#39;Amounts&#39;]&lt;= row1[&#39;Caps&#39;], row1[&#39;Caps&#39;]).sum()


# My ideal output would be this:

df3 = pd.DataFrame({&#39;Type&#39;:[&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;d&#39;],
                    &#39;Total&#39;:[&#39;130&#39;,&#39;195&#39;,&#39;35&#39;,&#39;80&#39;],
                    &#39;25limit&#39;:[&#39;50&#39;,&#39;75&#39;,&#39;25&#39;,&#39;25&#39;],
                    &#39;50limit&#39;:[&#39;95&#39;,&#39;125&#39;,&#39;35&#39;,&#39;50&#39;],
                    &#39;100limit&#39;:[&#39;130&#39;,&#39;190&#39;,&#39;35&#39;,&#39;80&#39;],
                    })

Output:

&gt;&gt;&gt; df3

  Type Total 25limit 50limit 100limit
0    a   130      50      95      130
1    b   195      75     125      190
2    c    35      25      35       35
3    d    80      25      50       80

答案1

得分: 3

使用numpy来比较所有的Amounts值与Caps值,通过广播到2D数组a,然后使用构造函数创建DataFrame,每列求和,然后转置使用DataFrame.TDataFrame.add_prefix

对于聚合列,使用DataFrame.insert来插入第一列,使用GroupBy.sum

df1['Caps'] = df1['Caps'].astype(int)
df2['Amounts'] = df2['Amounts'].astype(int)

am = df2['Amounts'].to_numpy()
ca = df1['Caps'].to_numpy()
#a  = np.where(am &lt;= ca[:, None], am[None, :], ca[:, None])
a  = np.where(am <= ca[:, None], am[None, :], ca[:, None])

df1 = (pd.DataFrame(a,columns=df2['Type'],index=df1['Caps'])
         .sum(axis=1, level=0).T.add_suffix('limit'))
df1.insert(0, 'Total', df2.groupby('Type')['Amounts'].sum())
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
  Type  Total  25limit  50limit  100limit
0    a    130       50       95       130
1    b    195       75      125       190
2    c     35       25       35        35
3    d     80       25       50        80

请注意,这是给定代码的翻译,其中包括代码注释。

英文:

Use numpy for compare all values Amounts with Caps by broadcasting to 2d array a, then create DataFrame by constructor with sum per columns, transpose by DataFrame.T and DataFrame.add_prefix.

For aggregated column use DataFrame.insert for first column with GroupBy.sum:

df1[&#39;Caps&#39;] = df1[&#39;Caps&#39;].astype(int)
df2[&#39;Amounts&#39;] = df2[&#39;Amounts&#39;].astype(int)

am = df2[&#39;Amounts&#39;].to_numpy()
ca = df1[&#39;Caps&#39;].to_numpy()
#pandas below 0.24
#am = df2[&#39;Amounts&#39;].values
#ca = df1[&#39;Caps&#39;].values
a  = np.where(am &lt;= ca[:, None], am[None, :], ca[:, None])

df1 = (pd.DataFrame(a,columns=df2[&#39;Type&#39;],index=df1[&#39;Caps&#39;])
         .sum(axis=1, level=0).T.add_suffix(&#39;limit&#39;))
df1.insert(0, &#39;Total&#39;, df2.groupby(&#39;Type&#39;)[&#39;Amounts&#39;].sum())
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
  Type  Total  25limit  50limit  100limit
0    a    130       50       95       130
1    b    195       75      125       190
2    c     35       25       35        35
3    d     80       25       50        80

答案2

得分: 0

以下是已翻译的内容:

这是我的解决方案,没有使用numpy,但比@jezrael的解决方案慢两倍,10.5毫秒对比5.07毫秒。

limcols = df1.Caps.to_list()
df2 = df2.reindex(columns=["Amounts", "Type"] + limcols)

df2[limcols] = df2[limcols].transform(
    lambda sc: np.where(df2.Amounts.le(sc.name), df2.Amounts, sc.name))

# Summations:
g = df2.groupby("Type")
df3 = g[limcols].sum()
df3.insert(0, "Total", g.Amounts.sum())

# Renaming columns:
c_dic = {lim: f"{lim:.0f}limit" for lim in limcols}
df3 = df3.rename(columns=c_dic).reset_index()

# Cleanup:
# df2 = df2.drop(columns=limcols)
英文:

Here is my solution without numpy, however it is two times slower than @jezrael's solution, 10.5ms vs. 5.07ms.

limcols= df1.Caps.to_list()
df2=df2.reindex(columns=[&quot;Amounts&quot;,&quot;Type&quot;]+limcols)

df2[limcols]= df2[limcols].transform( \
                  lambda sc: np.where(df2.Amounts.le(sc.name),df2.Amounts,sc.name))

# Summations:
g=df2.groupby(&quot;Type&quot;)
df3= g[limcols].sum()
df3.insert(0,&quot;Total&quot;, g.Amounts.sum())

# Renaming columns:
c_dic={ lim:f&quot;{lim:.0f}limit&quot; for lim in limcols}
df3= df3.rename(columns=c_dic).reset_index()

# Cleanup:
#df2=df2.drop(columns=limcols)

huangapple
  • 本文由 发表于 2020年1月3日 15:14:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574571.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定