英文:
Pandas - Scale within a group using a value from the group
问题
我的数据包括一些分组,这些分组接受了各种不同的处理,然后测量了一些结果,类似于以下示例:
X = pd.DataFrame({
'group':['A','A','A','B','B','B'],
'treatment':['control', 'high_dose', 'low_dose', 'control', 'high_dose', 'low_dose'],
'result':[2, 6, 4, 3, 12, 15]})
我想要在每个组内使用该组内的控制值来对结果进行缩放,以获得如下结果:
group treatment result result_group_stand
0 A control 2 1
1 A high_dose 6 3
2 A low_dose 4 2
3 B control 3 1
4 B high_dose 12 4
5 B low_dose 15 5
在这里,组“A”的每个结果都已经按照控制值2进行了缩放,组“B”的每个值都已经按照控制值3进行了缩放。所有我看到的示例都是使用groupby来按照汇总测量(求和、最大值、最小值等)来进行缩放的,但我找不到一个使用组内特定处理值的示例。感谢任何帮助。
英文:
My data consist of groups which have received a variety of treatments and then had some result measured, similar to this:
X = pd.DataFrame({
'group':['A','A','A','B','B','B'],
'treatment':['control', 'high_dose', 'low_dose', 'control', 'high_dose', 'low_dose'],
'result':[2, 6, 4, 3, 12, 15]})
group treatment result
0 A control 2
1 A high_dose 6
2 A low_dose 4
3 B control 3
4 B high_dose 12
5 B low_dose 15
I would like to scale the results within each group using the control value within each group to achieve a result like this:
group treatment result result_group_stand
0 A control 2 1
1 A high_dose 6 3
2 A low_dose 4 2
3 B control 3 1
4 B high_dose 12 4
5 B low_dose 15 5
Where every result in group "A" has been scaled by the Control value of 2, and every value in group "B" has been scaled by the Control value of 3. All of the examples I have seen use groupby to scale by a summary measurement (sum, max, min, etc...), but I cant find an example that uses a value of a specific treatment within the group. Thanks for any help.
答案1
得分: 2
X['result_group_stand'] = (X['result']
.div(X['group']
.map(X[X['treatment'].eq('control')]
.set_index('group')['result'])
)
)
或者使用groupby.transform
:
X['result_group_stand'] = (X['result']
.div(X['result'].where(X['treatment'].eq('control'))
.groupby(X['group']).transform('first')
)
)
输出:
group treatment result result_group_stand
0 A control 2 1.0
1 A high_dose 6 3.0
2 A low_dose 4 2.0
3 B control 3 1.0
4 B high_dose 12 4.0
5 B low_dose 15 5.0
英文:
Use a mapping with boolean indexing, set_index
and map
:
X['result_group_stand'] = (X['result']
.div(X['group']
.map(X[X['treatment'].eq('control')]
.set_index('group')['result'])
)
)
Or with groupby.transform
:
X['result_group_stand'] = (X['result']
.div(X['result'].where(X['treatment'].eq('control'))
.groupby(X['group']).transform('first')
)
)
Output:
group treatment result result_group_stand
0 A control 2 1.0
1 A high_dose 6 3.0
2 A low_dose 4 2.0
3 B control 3 1.0
4 B high_dose 12 4.0
5 B low_dose 15 5.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论