2023年7月28日 06:02:15go评论137阅读模式

英文:

R: Get values from separate dataframes based on the running values of a group_by function

问题

我有以下的数据框。

Col1 = c("A1", "A1", "A2", "A2")
Col2 = c("B1", "B1", "B2", "B2")
Value = c(10, 20, 30, 40)
df = data.frame(Col1, Col2, Value)

这是一个包含各种观测值的数据框。有两个因子列和一个值列。可以有相同组观测的多行，但具有不同的值。有多个类似观测的数据框。

MinCol1 = c("A1", "A2")
MinCol2 = c("B1", "B2")
MinValue = c(1, 1)
mins = data.frame(MinCol1, MinCol2, MinValue)
MaxCol1 = c("A1", "A2")
MaxCol2 = c("B1", "B2")
MaxValue = c(100, 100)
maxes = data.frame(MaxCol1, MaxCol2, MaxValue)

上面的两个数据框是所有数据框（如第一个数据框 df）中所有组（Col1 和 Col2）的最小值和最大值。

我想要对类似第一个数据框的数据框进行每个组的值标准化。我希望新的值在0到1之间，但我希望标准化的范围从mins和maxes数据框中获取。

normalizeDataForAllBenchmarks = function(df, mins, maxes) {
    
    ### 标准化指标 [0,1]
    df_normal = df %>%
      group_by(Col1, Col2) %>%
      mutate(Value = rescale(Value, to = c(0,1), from = c(mins$MinValue, maxes$MaxValue)))
    
    return(df_normal)
}

我有上面的函数，但我不确定在range函数中应该放什么，以便执行对mins和maxes数据框的每个组查找。

英文:

I have the following dataframe.

Col1 = c(&quot;A1&quot;, &quot;A1&quot;, &quot;A2&quot;, &quot;A2&quot;)
Col2 = c(&quot;B1&quot;, &quot;B1&quot;, &quot;B2&quot;, &quot;B2&quot;)
Value = c(10, 20, 30, 40)
df = data.frame(Col1, Col2, Value)

This is a dataframe with various observations. Two factor columns and a value column. There can be multiple rows of the same group of observations with different values. There are multiple such dataframes with similar observations.

MinCol1 = c(&quot;A1&quot;, &quot;A2&quot;)
MinCol2 = c(&quot;B1&quot;, &quot;B2&quot;)
MinValue = c(1, 1)
mins = data.frame(MinCol1, MinCol2, MinValue)
MaxCol1 = c(&quot;A1&quot;, &quot;A2&quot;)
MaxCol2 = c(&quot;B1&quot;, &quot;B2&quot;)
MaxValue = c(100, 100)
maxes = data.frame(MaxCol1, MaxCol2, MaxValue)

The above two dataframes are the minimum and maximum values for all groups (Col1 and Col2) across all dataframes (like the 1st one, df).

I want to normalize the values of dataframes like the 1st one per group. I want the new values to be between 0 to 1 but I want the range to be normalized against to be taken from the mins and maxes dataframes.

normalizeDataForAllBenchmarks = function(df, mins, maxes) {
    
    ### Normalize metrics [0,1]
    df_normal = df %&gt;%
      group_by(Process, Category, Metric) %&gt;%
      mutate(Value = rescale(Value, to = c(0,1), from = range(...)))
    
    return(df_normal)
}

I have the above function bun I'm not sure what goes in the range function in order to do a per group lookup into the mins and maxes dataframes.

答案1

得分: 2

库(tidyverse)
为所有基准规范化数据 = 函数(df, 最小值, 最大值) {
    左连接(df, 最小值, by = c("Col1" = "MinCol1", "Col2" = "MinCol2"))|
    左连接(最大值, by = c("Col1" = "MaxCol1", "Col2" = "MaxCol2")) |
    变异(跨(Value:MaxValue, as.numeric),
           Value = (Value - MinValue)/(MaxValue-MinValue))|
    选择(-c(MinValue, MaxValue))
}
为所有基准规范化数据(df, 最小值, 最大值)
#>   Col1 Col2      Value
#> 1   A1   B1 0.09090909
#> 2   A1   B1 0.19191919
#> 3   A2   B2 0.29292929
#> 4   A2   B2 0.39393939

英文:

All you need to do is join the data by the ids and then calculate the norm:

library(tidyverse)
normalizeDataForAllBenchmarks = function(df, mins, maxes) {
    left_join(df, mins, by = c(&quot;Col1&quot; = &quot;MinCol1&quot;, &quot;Col2&quot; = &quot;MinCol2&quot;))|&gt;
    left_join(maxes, by = c(&quot;Col1&quot; = &quot;MaxCol1&quot;, &quot;Col2&quot; = &quot;MaxCol2&quot;)) |&gt;
    mutate(across(Value:MaxValue, as.numeric),
           Value = (Value - MinValue)/(MaxValue-MinValue))|&gt;
    select(-c(MinValue, MaxValue))
}
normalizeDataForAllBenchmarks(df, mins, maxes)
#&gt;   Col1 Col2      Value
#&gt; 1   A1   B1 0.09090909
#&gt; 2   A1   B1 0.19191919
#&gt; 3   A2   B2 0.29292929
#&gt; 4   A2   B2 0.39393939

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从不同的数据框中根据group_by函数的运行值获取数值。

问题

答案1

是否有基于sf（或df）的替代方法来估计双变量正态核密度？

制作一个循环来计算R中站点之间的距离。

编写一个手动的BFS搜索算法

无法使用emmeans获得arcsin反转换。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。