pandas按多列分组并计算唯一值数量

huangapple go评论62阅读模式
英文:

pandas groupby nunique per multiple columns

问题

Sure, here's the code for adding the "count_per_id_user_and_date" column to your DataFrame using groupby and transform:

import pandas as pd

df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'],
                   'sth_else2':['zz','yy','zz','xx','xx']})

df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['user'].transform('count')

print(df)

This code will add the desired column to your DataFrame, showing the count of unique user/date combinations per id in the whole DataFrame.

英文:

I have a dataframe

df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})
   id	user	date	sth_else1	sth_else2
0	1	u1	2021-04-25	xx	        zz
1	1	u1	2021-04-25	yy       	yy
2	2	u2	2021-04-25	xx	        zz
3	2	u2	2021-04-26	xx	        xx
4	3	u3	2021-04-25	xx	        xx

and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this

   id	user	date	sth_else1	sth_else2	count_per_id_user_and_date
0	1	u1	2021-04-25	xx	        zz	        1
1	1	u1	2021-04-25	yy	        yy	        1
2	2	u2	2021-04-25	xx	        zz	        2
3	2	u2	2021-04-26	xx	        xx	        2
4	3	u3	2021-04-25	xx	        xx	        1

how would I do this?

答案1

得分: 1

如果您想要计算每个ID的用户/日期唯一组合数量,您可以首先识别非重复项,然后使用groupby.transform('sum')进行操作:

df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')

或者:

df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)

输出结果:

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1
英文:

If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):

df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')

Or:

df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)

Output:

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1

答案2

得分: 0

你可以尝试使用以下代码来实现:

df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
英文:

Maybe you should try this piece of code to make it happen

df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')

huangapple
  • 本文由 发表于 2023年7月6日 19:55:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76628570.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定