Pandas – 在组内使用来自组的值进行缩放

huangapple go评论88阅读模式
英文:

Pandas - Scale within a group using a value from the group

问题

我的数据包括一些分组,这些分组接受了各种不同的处理,然后测量了一些结果,类似于以下示例:

  1. X = pd.DataFrame({
  2. 'group':['A','A','A','B','B','B'],
  3. 'treatment':['control', 'high_dose', 'low_dose', 'control', 'high_dose', 'low_dose'],
  4. 'result':[2, 6, 4, 3, 12, 15]})

我想要在每个组内使用该组内的控制值来对结果进行缩放,以获得如下结果:

  1. group treatment result result_group_stand
  2. 0 A control 2 1
  3. 1 A high_dose 6 3
  4. 2 A low_dose 4 2
  5. 3 B control 3 1
  6. 4 B high_dose 12 4
  7. 5 B low_dose 15 5

在这里,组“A”的每个结果都已经按照控制值2进行了缩放,组“B”的每个值都已经按照控制值3进行了缩放。所有我看到的示例都是使用groupby来按照汇总测量(求和、最大值、最小值等)来进行缩放的,但我找不到一个使用组内特定处理值的示例。感谢任何帮助。

英文:

My data consist of groups which have received a variety of treatments and then had some result measured, similar to this:

  1. X = pd.DataFrame({
  2. 'group':['A','A','A','B','B','B'],
  3. 'treatment':['control', 'high_dose', 'low_dose', 'control', 'high_dose', 'low_dose'],
  4. 'result':[2, 6, 4, 3, 12, 15]})
  5. group treatment result
  6. 0 A control 2
  7. 1 A high_dose 6
  8. 2 A low_dose 4
  9. 3 B control 3
  10. 4 B high_dose 12
  11. 5 B low_dose 15

I would like to scale the results within each group using the control value within each group to achieve a result like this:

  1. group treatment result result_group_stand
  2. 0 A control 2 1
  3. 1 A high_dose 6 3
  4. 2 A low_dose 4 2
  5. 3 B control 3 1
  6. 4 B high_dose 12 4
  7. 5 B low_dose 15 5

Where every result in group "A" has been scaled by the Control value of 2, and every value in group "B" has been scaled by the Control value of 3. All of the examples I have seen use groupby to scale by a summary measurement (sum, max, min, etc...), but I cant find an example that uses a value of a specific treatment within the group. Thanks for any help.

答案1

得分: 2

使用布尔索引set_indexmap进行映射:

  1. X['result_group_stand'] = (X['result']
  2. .div(X['group']
  3. .map(X[X['treatment'].eq('control')]
  4. .set_index('group')['result'])
  5. )
  6. )

或者使用groupby.transform

  1. X['result_group_stand'] = (X['result']
  2. .div(X['result'].where(X['treatment'].eq('control'))
  3. .groupby(X['group']).transform('first')
  4. )
  5. )

输出:

  1. group treatment result result_group_stand
  2. 0 A control 2 1.0
  3. 1 A high_dose 6 3.0
  4. 2 A low_dose 4 2.0
  5. 3 B control 3 1.0
  6. 4 B high_dose 12 4.0
  7. 5 B low_dose 15 5.0
英文:

Use a mapping with boolean indexing, set_index and map:

  1. X['result_group_stand'] = (X['result']
  2. .div(X['group']
  3. .map(X[X['treatment'].eq('control')]
  4. .set_index('group')['result'])
  5. )
  6. )

Or with groupby.transform:

  1. X['result_group_stand'] = (X['result']
  2. .div(X['result'].where(X['treatment'].eq('control'))
  3. .groupby(X['group']).transform('first')
  4. )
  5. )

Output:

  1. group treatment result result_group_stand
  2. 0 A control 2 1.0
  3. 1 A high_dose 6 3.0
  4. 2 A low_dose 4 2.0
  5. 3 B control 3 1.0
  6. 4 B high_dose 12 4.0
  7. 5 B low_dose 15 5.0

huangapple
  • 本文由 发表于 2023年6月1日 20:43:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381996.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定