从不同的数据框中根据group_by函数的运行值获取数值。

huangapple go评论137阅读模式
英文:

R: Get values from separate dataframes based on the running values of a group_by function

问题

我有以下的数据框。

  1. Col1 = c("A1", "A1", "A2", "A2")
  2. Col2 = c("B1", "B1", "B2", "B2")
  3. Value = c(10, 20, 30, 40)
  4. df = data.frame(Col1, Col2, Value)

这是一个包含各种观测值的数据框。有两个因子列和一个值列。可以有相同组观测的多行,但具有不同的值。有多个类似观测的数据框。

  1. MinCol1 = c("A1", "A2")
  2. MinCol2 = c("B1", "B2")
  3. MinValue = c(1, 1)
  4. mins = data.frame(MinCol1, MinCol2, MinValue)
  5. MaxCol1 = c("A1", "A2")
  6. MaxCol2 = c("B1", "B2")
  7. MaxValue = c(100, 100)
  8. maxes = data.frame(MaxCol1, MaxCol2, MaxValue)

上面的两个数据框是所有数据框(如第一个数据框 df)中所有组(Col1Col2)的最小值和最大值。

我想要对类似第一个数据框的数据框进行每个组的值标准化。我希望新的值在0到1之间,但我希望标准化的范围从minsmaxes数据框中获取。

  1. normalizeDataForAllBenchmarks = function(df, mins, maxes) {
  2. ### 标准化指标 [0,1]
  3. df_normal = df %>%
  4. group_by(Col1, Col2) %>%
  5. mutate(Value = rescale(Value, to = c(0,1), from = c(mins$MinValue, maxes$MaxValue)))
  6. return(df_normal)
  7. }

我有上面的函数,但我不确定在range函数中应该放什么,以便执行对minsmaxes数据框的每个组查找。

英文:

I have the following dataframe.

  1. Col1 = c("A1", "A1", "A2", "A2")
  2. Col2 = c("B1", "B1", "B2", "B2")
  3. Value = c(10, 20, 30, 40)
  4. df = data.frame(Col1, Col2, Value)

This is a dataframe with various observations. Two factor columns and a value column. There can be multiple rows of the same group of observations with different values. There are multiple such dataframes with similar observations.

  1. MinCol1 = c("A1", "A2")
  2. MinCol2 = c("B1", "B2")
  3. MinValue = c(1, 1)
  4. mins = data.frame(MinCol1, MinCol2, MinValue)
  5. MaxCol1 = c("A1", "A2")
  6. MaxCol2 = c("B1", "B2")
  7. MaxValue = c(100, 100)
  8. maxes = data.frame(MaxCol1, MaxCol2, MaxValue)

The above two dataframes are the minimum and maximum values for all groups (Col1 and Col2) across all dataframes (like the 1st one, df).

I want to normalize the values of dataframes like the 1st one per group. I want the new values to be between 0 to 1 but I want the range to be normalized against to be taken from the mins and maxes dataframes.

  1. normalizeDataForAllBenchmarks = function(df, mins, maxes) {
  2. ### Normalize metrics [0,1]
  3. df_normal = df %>%
  4. group_by(Process, Category, Metric) %>%
  5. mutate(Value = rescale(Value, to = c(0,1), from = range(...)))
  6. return(df_normal)
  7. }

I have the above function bun I'm not sure what goes in the range function in order to do a per group lookup into the mins and maxes dataframes.

答案1

得分: 2

  1. 库(tidyverse)
  2. 为所有基准规范化数据 = 函数(df, 最小值, 最大值) {
  3. 左连接(df, 最小值, by = c("Col1" = "MinCol1", "Col2" = "MinCol2"))|
  4. 左连接(最大值, by = c("Col1" = "MaxCol1", "Col2" = "MaxCol2")) |
  5. 变异(跨(Value:MaxValue, as.numeric),
  6. Value = (Value - MinValue)/(MaxValue-MinValue))|
  7. 选择(-c(MinValue, MaxValue))
  8. }
  9. 为所有基准规范化数据(df, 最小值, 最大值)
  10. #> Col1 Col2 Value
  11. #> 1 A1 B1 0.09090909
  12. #> 2 A1 B1 0.19191919
  13. #> 3 A2 B2 0.29292929
  14. #> 4 A2 B2 0.39393939
英文:

All you need to do is join the data by the ids and then calculate the norm:

  1. library(tidyverse)
  2. normalizeDataForAllBenchmarks = function(df, mins, maxes) {
  3. left_join(df, mins, by = c("Col1" = "MinCol1", "Col2" = "MinCol2"))|>
  4. left_join(maxes, by = c("Col1" = "MaxCol1", "Col2" = "MaxCol2")) |>
  5. mutate(across(Value:MaxValue, as.numeric),
  6. Value = (Value - MinValue)/(MaxValue-MinValue))|>
  7. select(-c(MinValue, MaxValue))
  8. }
  9. normalizeDataForAllBenchmarks(df, mins, maxes)
  10. #> Col1 Col2 Value
  11. #> 1 A1 B1 0.09090909
  12. #> 2 A1 B1 0.19191919
  13. #> 3 A2 B2 0.29292929
  14. #> 4 A2 B2 0.39393939

huangapple
  • 本文由 发表于 2023年7月28日 06:02:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783671.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定