pandas按多列分组并计算唯一值数量

huangapple go评论95阅读模式
英文:

pandas groupby nunique per multiple columns

问题

Sure, here's the code for adding the "count_per_id_user_and_date" column to your DataFrame using groupby and transform:

  1. import pandas as pd
  2. df = pd.DataFrame({'id':[1,1,2,2,3],
  3. 'user':['u1', 'u1', 'u2', 'u2', 'u3'],
  4. 'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
  5. 'sth_else1':['xx','yy','xx','xx','xx'],
  6. 'sth_else2':['zz','yy','zz','xx','xx']})
  7. df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['user'].transform('count')
  8. print(df)

This code will add the desired column to your DataFrame, showing the count of unique user/date combinations per id in the whole DataFrame.

英文:

I have a dataframe

  1. df = pd.DataFrame({'id':[1,1,2,2,3],
  2. 'user':['u1', 'u1', 'u2', 'u2', 'u3'],
  3. 'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
  4. 'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})
  1. id user date sth_else1 sth_else2
  2. 0 1 u1 2021-04-25 xx zz
  3. 1 1 u1 2021-04-25 yy yy
  4. 2 2 u2 2021-04-25 xx zz
  5. 3 2 u2 2021-04-26 xx xx
  6. 4 3 u3 2021-04-25 xx xx

and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this

  1. id user date sth_else1 sth_else2 count_per_id_user_and_date
  2. 0 1 u1 2021-04-25 xx zz 1
  3. 1 1 u1 2021-04-25 yy yy 1
  4. 2 2 u2 2021-04-25 xx zz 2
  5. 3 2 u2 2021-04-26 xx xx 2
  6. 4 3 u3 2021-04-25 xx xx 1

how would I do this?

答案1

得分: 1

如果您想要计算每个ID的用户/日期唯一组合数量,您可以首先识别非重复项,然后使用groupby.transform('sum')进行操作:

  1. df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
  2. ).groupby(df['id']).transform('sum')

或者:

  1. df['count_per_id_user_and_date'] = df['id'].map(
  2. df[['id', 'user', 'date']].drop_duplicates()
  3. .groupby('id').size()
  4. )

输出结果:

  1. id user date sth_else1 sth_else2 count_per_id_user_and_date
  2. 0 1 u1 2021-04-25 xx zz 1
  3. 1 1 u1 2021-04-25 yy yy 1
  4. 2 2 u2 2021-04-25 xx zz 2
  5. 3 2 u2 2021-04-26 xx xx 2
  6. 4 3 u3 2021-04-25 xx xx 1
英文:

If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):

  1. df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
  2. ).groupby(df['id']).transform('sum')

Or:

  1. df['count_per_id_user_and_date'] = df['id'].map(
  2. df[['id', 'user', 'date']].drop_duplicates()
  3. .groupby('id').size()
  4. )

Output:

  1. id user date sth_else1 sth_else2 count_per_id_user_and_date
  2. 0 1 u1 2021-04-25 xx zz 1
  3. 1 1 u1 2021-04-25 yy yy 1
  4. 2 2 u2 2021-04-25 xx zz 2
  5. 3 2 u2 2021-04-26 xx xx 2
  6. 4 3 u3 2021-04-25 xx xx 1

答案2

得分: 0

你可以尝试使用以下代码来实现:

  1. df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
英文:

Maybe you should try this piece of code to make it happen

  1. df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')

huangapple
  • 本文由 发表于 2023年7月6日 19:55:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76628570.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定