英文:
pandas groupby nunique per multiple columns
问题
Sure, here's the code for adding the "count_per_id_user_and_date" column to your DataFrame using groupby and transform:
import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'],
                   'sth_else2':['zz','yy','zz','xx','xx']})
df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['user'].transform('count')
print(df)
This code will add the desired column to your DataFrame, showing the count of unique user/date combinations per id in the whole DataFrame.
英文:
I have a dataframe
df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})
   id	user	date	sth_else1	sth_else2
0	1	u1	2021-04-25	xx	        zz
1	1	u1	2021-04-25	yy       	yy
2	2	u2	2021-04-25	xx	        zz
3	2	u2	2021-04-26	xx	        xx
4	3	u3	2021-04-25	xx	        xx
and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this
   id	user	date	sth_else1	sth_else2	count_per_id_user_and_date
0	1	u1	2021-04-25	xx	        zz	        1
1	1	u1	2021-04-25	yy	        yy	        1
2	2	u2	2021-04-25	xx	        zz	        2
3	2	u2	2021-04-26	xx	        xx	        2
4	3	u3	2021-04-25	xx	        xx	        1
how would I do this?
答案1
得分: 1
如果您想要计算每个ID的用户/日期唯一组合数量,您可以首先识别非重复项,然后使用groupby.transform('sum')进行操作:
df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')
或者:
df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)
输出结果:
   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1
英文:
If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):
df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')
Or:
df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)
Output:
   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1
答案2
得分: 0
你可以尝试使用以下代码来实现:
df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
英文:
Maybe you should try this piece of code to make it happen
df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论