Pandas groupby sorting

huangapple go评论63阅读模式
英文:

Pandas groupby sorting

问题

我正在尝试使用pandas分析我的Netflix数据。我想总结每个用户观看特定标题的时间并打印每个档案的最高值。

df_clean.sample(4)
档案名称 时长 时间清理
AAA 0天00:20:00 哈利波特
AAA 0天00:41:50 罪人
BBB 0天00:00:15 阿凡达
AAA 0天00:15:00 哈利波特

我只想看到每个档案的第一行。

我尝试使用:

df_clean.groupby(['档案名称', 'title_clean'])['时长'].sum().sort_values(ascending=False).nlargest(1)

但它只返回一个档案的最大结果。

档案名称 title_clean
AAA 哈利波特 0天00:35:00
英文:

I am trying to analyze my Netflix data with pandas. I want to summarize the time each user spent watching a specific title and print the highest value for each Profile.

df_clean.sample(4)
Profile Name Duration time_clean
AAA 0 days 00:20:00 Harry Potter
AAA 0 days 00:41:50 The Sinner
BBB 0 days 00:00:15 Avatar
AAA 0 days 00:15:00 Harry Potter

I want to see only the first row for each Profile

I tried to use:

df_clean.groupby(['Profile Name','title_clean'])['Duration'].sum().sort_values(ascending=False).nlargest(1)

But it's returning me only the biggest result for 1 Profile

Profile Name title_clean
AAA Harry Potter 0 days 00:35:00

答案1

得分: 2

你可以链式使用另一个 groupby(level = 0)head(1) 来获得你想要的结果。

df_clean.groupby(['Profile Name', 'title_clean'])['Duration'].sum().sort_values(ascending=False).groupby(level = 0).head(1)
英文:

You can chain another groupby(level = 0) and head(1) to get the result you're looking for.

df_clean.groupby(['Profile Name', 'title_clean'])['Duration'].sum().sort_values(ascending=False).groupby(level = 0).head(1)

答案2

得分: 0

我会使用 idxmax 函数:

(df_clean.groupby(['Profile Name','title_clean'])['Duration'].sum()
 .loc[lambda g: g.groupby('Profile Name').idxmax()]
 .reset_index()
)

输出结果:

  Profile Name title_clean        Duration
0          AAA  The Sinner 0 days 00:41:50
1          BBB      Avatar 0 days 00:00:15
英文:

I would use idxmax:

(df_clean.groupby(['Profile Name','title_clean'])['Duration'].sum()
 .loc[lambda g: g.groupby('Profile Name').idxmax()]
 .reset_index()
)

Output:

  Profile Name title_clean        Duration
0          AAA  The Sinner 0 days 00:41:50
1          BBB      Avatar 0 days 00:00:15

答案3

得分: 0

def function1(dd: pd.DataFrame):
    dd1 = dd.groupby("title_clean")['Duration'].sum().sort_values(ascending=False).head(1)
    return dd1.rename("Duration")
df1.groupby('Profile Name').apply(function1)

输出:

Profile Name  title_clean
AAA           The Sinner     0 days 00:41:50
BBB           Avatar         0 days 00:00:15
英文:
def function1(dd:pd.DataFrame):
    dd1=dd.groupby("title_clean")['Duration'].sum().sort_values(ascending=False).head(1)
    return dd1.rename("Duration")
df1.groupby('Profile Name').apply(function1)

out:

Profile Name  title_clean
AAA           The Sinner     0 days 00:41:50
BBB           Avatar         0 days 00:00:15

huangapple
  • 本文由 发表于 2023年2月7日 01:41:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定