英文:
pandas groupby nunique per multiple columns
问题
Sure, here's the code for adding the "count_per_id_user_and_date" column to your DataFrame using groupby and transform:
import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2,3],
'user':['u1', 'u1', 'u2', 'u2', 'u3'],
'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
'sth_else1':['xx','yy','xx','xx','xx'],
'sth_else2':['zz','yy','zz','xx','xx']})
df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['user'].transform('count')
print(df)
This code will add the desired column to your DataFrame, showing the count of unique user/date combinations per id in the whole DataFrame.
英文:
I have a dataframe
df = pd.DataFrame({'id':[1,1,2,2,3],
'user':['u1', 'u1', 'u2', 'u2', 'u3'],
'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})
id user date sth_else1 sth_else2
0 1 u1 2021-04-25 xx zz
1 1 u1 2021-04-25 yy yy
2 2 u2 2021-04-25 xx zz
3 2 u2 2021-04-26 xx xx
4 3 u3 2021-04-25 xx xx
and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this
id user date sth_else1 sth_else2 count_per_id_user_and_date
0 1 u1 2021-04-25 xx zz 1
1 1 u1 2021-04-25 yy yy 1
2 2 u2 2021-04-25 xx zz 2
3 2 u2 2021-04-26 xx xx 2
4 3 u3 2021-04-25 xx xx 1
how would I do this?
答案1
得分: 1
如果您想要计算每个ID的用户/日期唯一组合数量,您可以首先识别非重复项,然后使用groupby.transform('sum')
进行操作:
df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
).groupby(df['id']).transform('sum')
或者:
df['count_per_id_user_and_date'] = df['id'].map(
df[['id', 'user', 'date']].drop_duplicates()
.groupby('id').size()
)
输出结果:
id user date sth_else1 sth_else2 count_per_id_user_and_date
0 1 u1 2021-04-25 xx zz 1
1 1 u1 2021-04-25 yy yy 1
2 2 u2 2021-04-25 xx zz 2
3 2 u2 2021-04-26 xx xx 2
4 3 u3 2021-04-25 xx xx 1
英文:
If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum')
:
df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
).groupby(df['id']).transform('sum')
Or:
df['count_per_id_user_and_date'] = df['id'].map(
df[['id', 'user', 'date']].drop_duplicates()
.groupby('id').size()
)
Output:
id user date sth_else1 sth_else2 count_per_id_user_and_date
0 1 u1 2021-04-25 xx zz 1
1 1 u1 2021-04-25 yy yy 1
2 2 u2 2021-04-25 xx zz 2
3 2 u2 2021-04-26 xx xx 2
4 3 u3 2021-04-25 xx xx 1
答案2
得分: 0
你可以尝试使用以下代码来实现:
df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
英文:
Maybe you should try this piece of code to make it happen
df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论