在R中计算一组值与该组的参考值之间的差异。

huangapple go评论115阅读模式
英文:

Calculate the difference between all values in a group and a reference value for the group in R

问题

我有一个像这样的数据框:

df1 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
                  Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
                  Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
   ID Var1 Var2
1   1   23    4
2   1   41    5
3   1   32    2
4   1   58    1
5   1   60    7
6   2   12    3
7   2   34    4
8   2   55    5
9   2   49    5
10  2   60    6
11  2   64    9

还有另一个数据框:

df2 <- data.frame(ID = c(1,2),
                  Var1 = c(20,67),
                  Var2 = c(5,3))

  ID Var1 Var2
1  1   20    5
2  2   67    3

我想在第一个数据框中添加一列,该列为每个ID组的两个数据框的列之间的绝对差异。也就是说,应该从第一个数据框中ID等于1的每个Var1值中减去20(即第二个数据框中ID等于1Var1值),然后将绝对值放入添加的列中。

我找到了一种解决方案,但它相当繁琐,因为我有一个更大的数据框,其中有许多变量:
首先,我将两个数据框按ID列合并,然后添加另一列计算绝对差异:

df3 <- merge(df1,df2,by="ID")
df3$Var1_Diff <- abs(df3$Var1.x - df3$Var1.y)
print(df3)

这样,我得到了一个像这样的丑陋的数据框:

       ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1   1     23      4     20      5         3
2   1     41      5     20      5        21
3   1     32      2     20      5        12
4   1     58      1     20      5        38
5   1     60      7     20      5        40
6   2     12      3     67      3        55
7   2     34      4     67      3        33
8   2     55      5     67      3        12
9   2     49      5     67      3        18
10  2     60      6     67      3         7
11  2     64      9     67      3         3

有人能提供一个更简单的解决方案吗?在第一个数据框中只需添加一列即可。

另外,由于我需要对许多Var列进行此过程,如果解决方案也能考虑到这一点,那将非常棒。

谢谢!

英文:

I have a dataframe like this:

df1 &lt;- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
                  Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
                  Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
   ID Var1 Var2
1   1   23    4
2   1   41    5
3   1   32    2
4   1   58    1
5   1   60    7
6   2   12    3
7   2   34    4
8   2   55    5
9   2   49    5
10  2   60    6
11  2   64    9

and another one like this:

df2 &lt;- data.frame(ID = c(1,2),
                  Var1 = c(20,67),
                  Var2 = c(5,3))

  ID Var1 Var2
1  1   20    5
2  2   67    3

I want to add a column to the first dataframe with the absolute difference between the columns of the two dataframes for every ID group. That is 20 (i.e., the Var1 value with ID equal to 1 in the second dataframe) should be subtracted from every Var1 value with an ID equal to 1 in the first dataframe and then the absolute value should be given in an added column.

I figured out one solution which is rather cumbersome as I have a much larger dataframe with many variables:
First, I would merge the two dataframes by the ID column and then add another column calculating the absolute difference:

df3 &lt;- merge(df1,df2,by=&quot;ID&quot;)
df3$Var1_Diff &lt;- abs(df3$Var1.x - df3$Var1.y)
print(df3)

So that I end up with a ugly dataframe like this:

       ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1   1     23      4     20      5         3
2   1     41      5     20      5        21
3   1     32      2     20      5        12
4   1     58      1     20      5        38
5   1     60      7     20      5        40
6   2     12      3     67      3        55
7   2     34      4     67      3        33
8   2     55      5     67      3        12
9   2     49      5     67      3        18
10  2     60      6     67      3         7
11  2     64      9     67      3         3

Can anyone suggest a simpler solution where I just add a column to the first dataframe?

Also, as I have do this process for many Var colums it would be awesome if the solution could also take that into consideration.

Thank you!

答案1

得分: 1

这应该可以很好地扩展:

library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = "source") %>%
  mutate(
    across(starts_with("Var"), \(x) abs(x - x[source == "ref"]),
           .names = "{.col}_diff"),
    .by = ID
  ) %>%
  filter(source == "keep") %>%
  select(-source)
#    ID Var1 Var2 Var1_diff Var2_diff
# 1   1   23    4         3         1
# 2   1   41    5        21         0
# 3   1   32    2        12         3
# 4   1   58    1        38         4
# 5   1   60    7        40         2
# 6   2   12    3        55         0
# 7   2   34    4        33         1
# 8   2   55    5        12         2
# 9   2   49    5        18         2
# 10  2   60    6         7         3
# 11  2   64    9         3         6
英文:

This should scale up nicely:

library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = &quot;source&quot;) |&gt;
  mutate(
    across(starts_with(&quot;Var&quot;), \(x) abs(x - x[source == &quot;ref&quot;]),
           .names = &quot;{.col}_diff&quot;),
    .by = ID
  ) |&gt;
  filter(source == &quot;keep&quot;) |&gt;
  select(-source)
#    ID Var1 Var2 Var1_diff Var2_diff
# 1   1   23    4         3         1
# 2   1   41    5        21         0
# 3   1   32    2        12         3
# 4   1   58    1        38         4
# 5   1   60    7        40         2
# 6   2   12    3        55         0
# 7   2   34    4        33         1
# 8   2   55    5        12         2
# 9   2   49    5        18         2
# 10  2   60    6         7         3
# 11  2   64    9         3         6

答案2

得分: 1

你可以使用data.table的连接操作来跳过一些中间步骤:

library(data.table)

setDT(df1)[
  setDT(df2),
  `:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
  on = "ID"
][]
#>     ID Var1 Var2 diff1 diff2
#>  1:  1   23    4     3     1
#>  2:  1   41    5    21     0
#>  3:  1   32    2    12     3
#>  4:  1   58    1    38     4
#>  5:  1   60    7    40     2
#>  6:  2   12    3    55     0
#>  7:  2   34    4    33     1
#>  8:  2   55    5    12     2
#>  9:  2   49    5    18     2
#> 10:  2   60    6     7     3
#> 11:  2   64    9     3     6

这段代码使用setDTdf1df2转换为data.table对象,然后使用:=操作符在连接的基础上计算了diff1diff2两列的差值,并将结果存储在原始数据表中。最后,使用[]运算符将结果返回。

英文:

You can skip some of the intermediates with a data.table join:

library(data.table)

setDT(df1)[
  setDT(df2),
  `:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
  on = &quot;ID&quot;
][]
#&gt;     ID Var1 Var2 diff1 diff2
#&gt;  1:  1   23    4     3     1
#&gt;  2:  1   41    5    21     0
#&gt;  3:  1   32    2    12     3
#&gt;  4:  1   58    1    38     4
#&gt;  5:  1   60    7    40     2
#&gt;  6:  2   12    3    55     0
#&gt;  7:  2   34    4    33     1
#&gt;  8:  2   55    5    12     2
#&gt;  9:  2   49    5    18     2
#&gt; 10:  2   60    6     7     3
#&gt; 11:  2   64    9     3     6

huangapple
  • 本文由 发表于 2023年8月9日 02:28:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862277.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定