2023年7月6日 19:55:10go评论95阅读模式

英文:

pandas groupby nunique per multiple columns

问题

Sure, here's the code for adding the "count_per_id_user_and_date" column to your DataFrame using groupby and transform:

import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'],
                   'sth_else2':['zz','yy','zz','xx','xx']})
df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['user'].transform('count')
print(df)

This code will add the desired column to your DataFrame, showing the count of unique user/date combinations per id in the whole DataFrame.

英文:

I have a dataframe

df = pd.DataFrame({&#39;id&#39;:[1,1,2,2,3], 
                   &#39;user&#39;:[&#39;u1&#39;, &#39;u1&#39;, &#39;u2&#39;, &#39;u2&#39;, &#39;u3&#39;],
                   &#39;date&#39;:[&#39;2021-04-25&#39;,&#39;2021-04-25&#39;,&#39;2021-04-25&#39;,&#39;2021-04-26&#39;, &#39;2021-04-25&#39;],
                   &#39;sth_else1&#39;:[&#39;xx&#39;,&#39;yy&#39;,&#39;xx&#39;,&#39;xx&#39;,&#39;xx&#39;], &#39;sth_else2&#39;:[&#39;zz&#39;,&#39;yy&#39;,&#39;zz&#39;,&#39;xx&#39;,&#39;xx&#39;]})

   id	user	date	sth_else1	sth_else2
0	1	u1	2021-04-25	xx	        zz
1	1	u1	2021-04-25	yy       	yy
2	2	u2	2021-04-25	xx	        zz
3	2	u2	2021-04-26	xx	        xx
4	3	u3	2021-04-25	xx	        xx

and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this

   id	user	date	sth_else1	sth_else2	count_per_id_user_and_date
0	1	u1	2021-04-25	xx	        zz	        1
1	1	u1	2021-04-25	yy	        yy	        1
2	2	u2	2021-04-25	xx	        zz	        2
3	2	u2	2021-04-26	xx	        xx	        2
4	3	u3	2021-04-25	xx	        xx	        1

how would I do this?

答案1

得分: 1

如果您想要计算每个ID的用户/日期唯一组合数量，您可以首先识别非重复项，然后使用groupby.transform('sum')进行操作：

df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')

或者：

df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)

输出结果：

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1

英文:

If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):

df[&#39;count_per_id_user_and_date&#39;] = (~df[[&#39;id&#39;, &#39;user&#39;, &#39;date&#39;]].duplicated()
                                   ).groupby(df[&#39;id&#39;]).transform(&#39;sum&#39;)

Or:

df[&#39;count_per_id_user_and_date&#39;] = df[&#39;id&#39;].map(
    df[[&#39;id&#39;, &#39;user&#39;, &#39;date&#39;]].drop_duplicates()
    .groupby(&#39;id&#39;).size()
)

Output:

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1

答案2

得分: 0

你可以尝试使用以下代码来实现：

df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')

英文:

Maybe you should try this piece of code to make it happen

df[&#39;final_result&#39;] = df.groupby([&#39;id&#39;, &#39;user&#39;, &#39;date&#39;])[&#39;id&#39;].transform(&#39;size&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas按多列分组并计算唯一值数量

问题

答案1

答案2

为什么 os.path 引用项目路径而不是文件路径？

pip install -r requirements.txt 在虚拟环境中不起作用

数据应该在每个微服务中独立验证吗？

Pandas DataFrame.groupby().agg() 问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。