2023年2月7日 01:41:06go评论95阅读模式

英文:

Pandas groupby sorting

问题

我正在尝试使用pandas分析我的Netflix数据。我想总结每个用户观看特定标题的时间并打印每个档案的最高值。

df_clean.sample(4)

档案名称	时长	时间清理
AAA	0天00:20:00	哈利波特
AAA	0天00:41:50	罪人
BBB	0天00:00:15	阿凡达
AAA	0天00:15:00	哈利波特

我只想看到每个档案的第一行。

我尝试使用：

df_clean.groupby(['档案名称', 'title_clean'])['时长'].sum().sort_values(ascending=False).nlargest(1)

但它只返回一个档案的最大结果。

档案名称	title_clean
AAA	哈利波特	0天00:35:00

英文:

I am trying to analyze my Netflix data with pandas. I want to summarize the time each user spent watching a specific title and print the highest value for each Profile.

df_clean.sample(4)

Profile Name	Duration	time_clean
AAA	0 days 00:20:00	Harry Potter
AAA	0 days 00:41:50	The Sinner
BBB	0 days 00:00:15	Avatar
AAA	0 days 00:15:00	Harry Potter

I want to see only the first row for each Profile

I tried to use:

df_clean.groupby([&#39;Profile Name&#39;,&#39;title_clean&#39;])[&#39;Duration&#39;].sum().sort_values(ascending=False).nlargest(1)

But it's returning me only the biggest result for 1 Profile

Profile Name	title_clean
AAA	Harry Potter	0 days 00:35:00

答案1

得分: 2

你可以链式使用另一个 groupby(level = 0) 和 head(1) 来获得你想要的结果。

df_clean.groupby(['Profile Name', 'title_clean'])['Duration'].sum().sort_values(ascending=False).groupby(level = 0).head(1)

英文:

You can chain another groupby(level = 0) and head(1) to get the result you're looking for.

df_clean.groupby([&#39;Profile Name&#39;, &#39;title_clean&#39;])[&#39;Duration&#39;].sum().sort_values(ascending=False).groupby(level = 0).head(1)

答案2

得分: 0

我会使用 idxmax 函数：

(df_clean.groupby(['Profile Name','title_clean'])['Duration'].sum()
 .loc[lambda g: g.groupby('Profile Name').idxmax()]
 .reset_index()
)

输出结果：

  Profile Name title_clean        Duration
0          AAA  The Sinner 0 days 00:41:50
1          BBB      Avatar 0 days 00:00:15

英文:

I would use idxmax:

(df_clean.groupby([&#39;Profile Name&#39;,&#39;title_clean&#39;])[&#39;Duration&#39;].sum()
 .loc[lambda g: g.groupby(&#39;Profile Name&#39;).idxmax()]
 .reset_index()
)

Output:

  Profile Name title_clean        Duration
0          AAA  The Sinner 0 days 00:41:50
1          BBB      Avatar 0 days 00:00:15

答案3

得分: 0

def function1(dd: pd.DataFrame):
    dd1 = dd.groupby("title_clean")['Duration'].sum().sort_values(ascending=False).head(1)
    return dd1.rename("Duration")
df1.groupby('Profile Name').apply(function1)

输出：

Profile Name  title_clean
AAA           The Sinner     0 days 00:41:50
BBB           Avatar         0 days 00:00:15

英文:

def function1(dd:pd.DataFrame):
    dd1=dd.groupby(&quot;title_clean&quot;)[&#39;Duration&#39;].sum().sort_values(ascending=False).head(1)
    return dd1.rename(&quot;Duration&quot;)
df1.groupby(&#39;Profile Name&#39;).apply(function1)

out：

Profile Name  title_clean
AAA           The Sinner     0 days 00:41:50
BBB           Avatar         0 days 00:00:15

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas groupby sorting

问题

答案1

答案2

答案3

你可以在不使用模板的情况下使用Django框架吗？

Use sort_values ascending for value greater than 100.

无法在Ubuntu服务器上使用Gunicorn启动Flask应用。

有没有办法使用tabulate和jinja2一起工作？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。