2023年5月22日 23:49:31go评论100阅读模式

英文:

Groupby and transform across a group, not within it

问题

df['CumulativeTotal'] = df.groupby('group')['weeklyTotal'].cumsum()

英文:

I'm trying to get the third column of this dataframe, given the first two columns.

I can't work out what to search for, it's like a within version of groupby('group')['weeklyTotal'].cumsum()??

I know I could pull out those two columns, make them distinct, then do the groupby cumsum, but would much prefer to have this within the same dataframe.

To save a tiny bit of pain, here's an example dataframe:

df = pd.DataFrame({'group':['A','A','A','B','B','B','C','C','C'], 'weeklyTotal':[1,1,1,3,3,3,2,2,2]})

Group	WeeklyTotal	CumulativeTotal
A	1	1
A	1	1
A	1	1
B	3	4
B	3	4
B	3	4
C	2	6
C	2	6
C	2	6

答案1

得分: 1

使用drop_duplicates函数，每个组保留一行，然后计算cumsum和map这些值：

df['CumulativeTotal'] = df['group'].map(df.drop_duplicates(subset='group')
                                      .set_index('group')['weeklyTotal']
                                      .cumsum()
                                   )

或者，使用mask和duplicated：

df['CumulativeTotal'] = (df['weeklyTotal']
                     .mask(df['group'].duplicated(), 0)
                     .cumsum()
                    )

输出：

  group  weeklyTotal  CumulativeTotal
0     A            1                1
1     A            1                1
2     A            1                1
3     B            3                4
4     B            3                4
5     B            3                4
6     C            2                6
7     C            2                6
8     C            2                6

请注意，这是用Python Pandas进行数据处理的代码示例。

英文:

Keep only one row per group with drop_duplicates, compute the cumsum and map the values:

df[&#39;CumulativeTotal&#39;] = df[&#39;group&#39;].map(df.drop_duplicates(subset=&#39;group&#39;)
                                          .set_index(&#39;group&#39;)[&#39;weeklyTotal&#39;]
                                          .cumsum()
                                       )

Or, using a mask and duplicated:

df[&#39;CumulativeTotal&#39;] = (df[&#39;weeklyTotal&#39;]
                         .mask(df[&#39;group&#39;].duplicated(), 0)
                         .cumsum()
                        )

Output:

  group  weeklyTotal  CumulativeTotal
0     A            1                1
1     A            1                1
2     A            1                1
3     B            3                4
4     B            3                4
5     B            3                4
6     C            2                6
7     C            2                6
8     C            2                6

答案2

得分: 0

这是另一种方式：

m = df['group'].ne(df['group'].shift())
m.mul(df['weeklyTotal']).cumsum()

输出：

英文:

Here is another way:

m = df[&#39;group&#39;].ne(df[&#39;group&#39;].shift())
m.mul(df[&#39;weeklyTotal&#39;]).cumsum()

Output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Groupby and transform across a group, not within it.

问题

答案1

答案2

如何在for循环中创建嵌套字典（不使用defaultdict）？

来自FLASK通过AJAX调用的结果在网页上的警报通知中没有返回结果

生成一组沿一条线的新点

Conda频道能包括多个频道吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。