Pandas 分组、连接和求和

huangapple go评论66阅读模式
英文:

Pandas Groupby, Join and Sum

问题

使用Pandas,我正在尝试聚合类似以下结构的CSV文件:

       A       B      C             D      E
1     12345    0    California      0      5.00 
2     12345    0    Florida         0      2.00
3     67898    0    Rhode Island    0      1.00
4     67898    0    North Carolina  0      3.50
5     44444    0    Alaska          0      6.00
6     44444    0    Texas           0      3.50

我想要按列A分组,使用逗号连接列C的值,显示具有相同列A值的行的总金额,然后导出到CSV文件。CSV将如下所示:

       A       B                              C      
1     12345    California, Florida            7.00 
2     67898    Rhode Island, North Carolina   4.50
3     44444    Alaska, Texas                  9.50

我有类似以下的代码:

import pandas as pd

df = pd.read_csv('data.csv')

out = df.groupby('A', as_index=False, sort=False).agg({'C': 'sum', 'C': 'first'})

out.to_csv('out.csv')

如果需要进一步的帮助,请告诉我。

英文:

Using Pandas Im trying to aggregate some info of a csv that looks like this

       A       B      C             D      E
1     12345    0    California      0      5.00 
2     12345    0    Florida         0      2.00
3     67898    0    Rhode Island    0      1.00
4     67898    0    North Carolina  0      3.50
5     44444    0    Alaska          0      6.00
6     44444    0    Texas           0      3.50

I want to group by column A, join by commas values on column C , display sum amount of rows that have same value of column A then export to csv

The csv will look like this

       A       B                              C      
1     12345    California, Florida            7.00 
2     67898    Rhode Island,North Carolina    4.50
3     44444    Alaska, Texas                  9.50

I have something like the following:

import pandas as pd

df = pd.read_csv('data.csv')

out = df.groupby('A', as_index=False, sort=False).agg({'C':'sum', 'C':'first'})

out.to_csv('out.csv')

答案1

得分: 3

你可以在agg()调用中使用命名聚合来对列CE应用不同的函数。对于列C,你想要使用join()将字符串连接在一起。对于列E,你想要使用sum将分组的值相加:

df2 = df.groupby('A', as_index=False).agg(C=('C', lambda x: ', '.join(map(str, x))), E=('E', sum))

df2:

       A                             C    E
0  12345           California, Florida  7.0
1  44444                 Alaska, Texas  9.5
2  67898  Rhode Island, North Carolina  4.5
英文:

You can used named aggregation in your agg() call to apply different functions to col C and E. For col C you want to join() the strings together. For col E you want to sum the grouped values together:

df2 = df.groupby('A', as_index=False).agg(C=('C', lambda x: ', '.join(map(str, x))), E=('E', sum))

df2:

       A                             C    E
0  12345           California, Florida  7.0
1  44444                 Alaska, Texas  9.5
2  67898  Rhode Island, North Carolina  4.5

huangapple
  • 本文由 发表于 2023年4月11日 04:28:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75980474.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定