Groupby and transform across a group, not within it.

huangapple go评论100阅读模式
英文:

Groupby and transform across a group, not within it

问题

df['CumulativeTotal'] = df.groupby('group')['weeklyTotal'].cumsum()

英文:

I'm trying to get the third column of this dataframe, given the first two columns.

I can't work out what to search for, it's like a within version of groupby('group')['weeklyTotal'].cumsum()??

I know I could pull out those two columns, make them distinct, then do the groupby cumsum, but would much prefer to have this within the same dataframe.

To save a tiny bit of pain, here's an example dataframe:

df = pd.DataFrame({'group':['A','A','A','B','B','B','C','C','C'], 'weeklyTotal':[1,1,1,3,3,3,2,2,2]})

Group WeeklyTotal CumulativeTotal
A 1 1
A 1 1
A 1 1
B 3 4
B 3 4
B 3 4
C 2 6
C 2 6
C 2 6

答案1

得分: 1

使用drop_duplicates函数,每个组保留一行,然后计算cumsummap这些值:

  1. df['CumulativeTotal'] = df['group'].map(df.drop_duplicates(subset='group')
  2. .set_index('group')['weeklyTotal']
  3. .cumsum()
  4. )

或者,使用maskduplicated

  1. df['CumulativeTotal'] = (df['weeklyTotal']
  2. .mask(df['group'].duplicated(), 0)
  3. .cumsum()
  4. )

输出:

  1. group weeklyTotal CumulativeTotal
  2. 0 A 1 1
  3. 1 A 1 1
  4. 2 A 1 1
  5. 3 B 3 4
  6. 4 B 3 4
  7. 5 B 3 4
  8. 6 C 2 6
  9. 7 C 2 6
  10. 8 C 2 6

请注意,这是用Python Pandas进行数据处理的代码示例。

英文:

Keep only one row per group with drop_duplicates, compute the cumsum and map the values:

  1. df['CumulativeTotal'] = df['group'].map(df.drop_duplicates(subset='group')
  2. .set_index('group')['weeklyTotal']
  3. .cumsum()
  4. )

Or, using a mask and duplicated:

  1. df['CumulativeTotal'] = (df['weeklyTotal']
  2. .mask(df['group'].duplicated(), 0)
  3. .cumsum()
  4. )

Output:

  1. group weeklyTotal CumulativeTotal
  2. 0 A 1 1
  3. 1 A 1 1
  4. 2 A 1 1
  5. 3 B 3 4
  6. 4 B 3 4
  7. 5 B 3 4
  8. 6 C 2 6
  9. 7 C 2 6
  10. 8 C 2 6

答案2

得分: 0

这是另一种方式:

  1. m = df['group'].ne(df['group'].shift())
  2. m.mul(df['weeklyTotal']).cumsum()

输出:

  1. 0 1
  2. 1 1
  3. 2 1
  4. 3 4
  5. 4 4
  6. 5 4
  7. 6 6
  8. 7 6
  9. 8 6
英文:

Here is another way:

  1. m = df['group'].ne(df['group'].shift())
  2. m.mul(df['weeklyTotal']).cumsum()

Output:

  1. 0 1
  2. 1 1
  3. 2 1
  4. 3 4
  5. 4 4
  6. 5 4
  7. 6 6
  8. 7 6
  9. 8 6

huangapple
  • 本文由 发表于 2023年5月22日 23:49:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76307905.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定