2023年6月1日 20:43:35go评论94阅读模式

英文:

Pandas - Scale within a group using a value from the group

问题

我的数据包括一些分组，这些分组接受了各种不同的处理，然后测量了一些结果，类似于以下示例：

X = pd.DataFrame({
    'group':['A','A','A','B','B','B'],
    'treatment':['control', 'high_dose', 'low_dose', 'control', 'high_dose', 'low_dose'],
    'result':[2, 6, 4, 3, 12, 15]})

我想要在每个组内使用该组内的控制值来对结果进行缩放，以获得如下结果：

      group  treatment  result  result_group_stand
    0     A    control       2                    1
    1     A  high_dose       6                    3
    2     A   low_dose       4                    2
    3     B    control       3                    1
    4     B  high_dose      12                    4
    5     B   low_dose      15                    5

在这里，组“A”的每个结果都已经按照控制值2进行了缩放，组“B”的每个值都已经按照控制值3进行了缩放。所有我看到的示例都是使用groupby来按照汇总测量（求和、最大值、最小值等）来进行缩放的，但我找不到一个使用组内特定处理值的示例。感谢任何帮助。

英文:

My data consist of groups which have received a variety of treatments and then had some result measured, similar to this:

X = pd.DataFrame({
&#39;group&#39;:[&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;B&#39;,&#39;B&#39;,&#39;B&#39;],
&#39;treatment&#39;:[&#39;control&#39;, &#39;high_dose&#39;, &#39;low_dose&#39;, &#39;control&#39;, &#39;high_dose&#39;, &#39;low_dose&#39;],
&#39;result&#39;:[2, 6, 4, 3, 12, 15]})
  group  treatment  result
0     A    control       2
1     A  high_dose       6
2     A   low_dose       4
3     B    control       3
4     B  high_dose      12
5     B   low_dose      15

I would like to scale the results within each group using the control value within each group to achieve a result like this:

  group  treatment  result  result_group_stand
0     A    control       2                    1
1     A  high_dose       6                    3
2     A   low_dose       4                    2
3     B    control       3                    1
4     B  high_dose      12                    4
5     B   low_dose      15                    5

Where every result in group "A" has been scaled by the Control value of 2, and every value in group "B" has been scaled by the Control value of 3. All of the examples I have seen use groupby to scale by a summary measurement (sum, max, min, etc...), but I cant find an example that uses a value of a specific treatment within the group. Thanks for any help.

答案1

得分: 2

使用布尔索引、set_index和map进行映射：

X['result_group_stand'] = (X['result']
                           .div(X['group']
                                .map(X[X['treatment'].eq('control')]
                                      .set_index('group')['result'])
                               )
                          )

或者使用groupby.transform：

X['result_group_stand'] = (X['result']
                           .div(X['result'].where(X['treatment'].eq('control'))
                                .groupby(X['group']).transform('first')
                               )
                          )

输出：

  group  treatment  result  result_group_stand
0     A    control       2                 1.0
1     A  high_dose       6                 3.0
2     A   low_dose       4                 2.0
3     B    control       3                 1.0
4     B  high_dose      12                 4.0
5     B   low_dose      15                 5.0

英文:

Use a mapping with boolean indexing, set_index and map:

X[&#39;result_group_stand&#39;] = (X[&#39;result&#39;]
                           .div(X[&#39;group&#39;]
                                .map(X[X[&#39;treatment&#39;].eq(&#39;control&#39;)]
                                      .set_index(&#39;group&#39;)[&#39;result&#39;])
                               )
                          )

Or with groupby.transform:

X[&#39;result_group_stand&#39;] = (X[&#39;result&#39;]
                           .div(X[&#39;result&#39;].where(X[&#39;treatment&#39;].eq(&#39;control&#39;))
                                .groupby(X[&#39;group&#39;]).transform(&#39;first&#39;)
                               )
                          )

Output:

  group  treatment  result  result_group_stand
0     A    control       2                 1.0
1     A  high_dose       6                 3.0
2     A   low_dose       4                 2.0
3     B    control       3                 1.0
4     B  high_dose      12                 4.0
5     B   low_dose      15                 5.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas – 在组内使用来自组的值进行缩放

问题

答案1

Does the time taken to access the object from the aws s3 bucket increase if the number of files keeps increasing in the bucket?

循环以合并具有相同键的字典。

如何在OpenCV Python中分割轮廓？

如何从图表中网页抓取数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。