Groupby and transform across a group, not within it.

huangapple go评论74阅读模式
英文:

Groupby and transform across a group, not within it

问题

df['CumulativeTotal'] = df.groupby('group')['weeklyTotal'].cumsum()

英文:

I'm trying to get the third column of this dataframe, given the first two columns.

I can't work out what to search for, it's like a within version of groupby('group')['weeklyTotal'].cumsum()??

I know I could pull out those two columns, make them distinct, then do the groupby cumsum, but would much prefer to have this within the same dataframe.

To save a tiny bit of pain, here's an example dataframe:

df = pd.DataFrame({'group':['A','A','A','B','B','B','C','C','C'], 'weeklyTotal':[1,1,1,3,3,3,2,2,2]})

Group WeeklyTotal CumulativeTotal
A 1 1
A 1 1
A 1 1
B 3 4
B 3 4
B 3 4
C 2 6
C 2 6
C 2 6

答案1

得分: 1

使用drop_duplicates函数,每个组保留一行,然后计算cumsummap这些值:

df['CumulativeTotal'] = df['group'].map(df.drop_duplicates(subset='group')
                                      .set_index('group')['weeklyTotal']
                                      .cumsum()
                                   )

或者,使用maskduplicated

df['CumulativeTotal'] = (df['weeklyTotal']
                     .mask(df['group'].duplicated(), 0)
                     .cumsum()
                    )

输出:

  group  weeklyTotal  CumulativeTotal
0     A            1                1
1     A            1                1
2     A            1                1
3     B            3                4
4     B            3                4
5     B            3                4
6     C            2                6
7     C            2                6
8     C            2                6

请注意,这是用Python Pandas进行数据处理的代码示例。

英文:

Keep only one row per group with drop_duplicates, compute the cumsum and map the values:

df['CumulativeTotal'] = df['group'].map(df.drop_duplicates(subset='group')
                                          .set_index('group')['weeklyTotal']
                                          .cumsum()
                                       )

Or, using a mask and duplicated:

df['CumulativeTotal'] = (df['weeklyTotal']
                         .mask(df['group'].duplicated(), 0)
                         .cumsum()
                        )

Output:

  group  weeklyTotal  CumulativeTotal
0     A            1                1
1     A            1                1
2     A            1                1
3     B            3                4
4     B            3                4
5     B            3                4
6     C            2                6
7     C            2                6
8     C            2                6

答案2

得分: 0

这是另一种方式:

m = df['group'].ne(df['group'].shift())
m.mul(df['weeklyTotal']).cumsum()

输出:

0    1
1    1
2    1
3    4
4    4
5    4
6    6
7    6
8    6
英文:

Here is another way:

m = df['group'].ne(df['group'].shift())
m.mul(df['weeklyTotal']).cumsum()

Output:

0    1
1    1
2    1
3    4
4    4
5    4
6    6
7    6
8    6

huangapple
  • 本文由 发表于 2023年5月22日 23:49:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76307905.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定