英文:
Calculate the difference between all values in a group and a reference value for the group in R
问题
我有一个像这样的数据框:
df1 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
ID Var1 Var2
1 1 23 4
2 1 41 5
3 1 32 2
4 1 58 1
5 1 60 7
6 2 12 3
7 2 34 4
8 2 55 5
9 2 49 5
10 2 60 6
11 2 64 9
还有另一个数据框:
df2 <- data.frame(ID = c(1,2),
Var1 = c(20,67),
Var2 = c(5,3))
ID Var1 Var2
1 1 20 5
2 2 67 3
我想在第一个数据框中添加一列,该列为每个ID
组的两个数据框的列之间的绝对差异。也就是说,应该从第一个数据框中ID
等于1
的每个Var1
值中减去20(即第二个数据框中ID
等于1
的Var1
值),然后将绝对值放入添加的列中。
我找到了一种解决方案,但它相当繁琐,因为我有一个更大的数据框,其中有许多变量:
首先,我将两个数据框按ID
列合并,然后添加另一列计算绝对差异:
df3 <- merge(df1,df2,by="ID")
df3$Var1_Diff <- abs(df3$Var1.x - df3$Var1.y)
print(df3)
这样,我得到了一个像这样的丑陋的数据框:
ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1 1 23 4 20 5 3
2 1 41 5 20 5 21
3 1 32 2 20 5 12
4 1 58 1 20 5 38
5 1 60 7 20 5 40
6 2 12 3 67 3 55
7 2 34 4 67 3 33
8 2 55 5 67 3 12
9 2 49 5 67 3 18
10 2 60 6 67 3 7
11 2 64 9 67 3 3
有人能提供一个更简单的解决方案吗?在第一个数据框中只需添加一列即可。
另外,由于我需要对许多Var
列进行此过程,如果解决方案也能考虑到这一点,那将非常棒。
谢谢!
英文:
I have a dataframe like this:
df1 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
ID Var1 Var2
1 1 23 4
2 1 41 5
3 1 32 2
4 1 58 1
5 1 60 7
6 2 12 3
7 2 34 4
8 2 55 5
9 2 49 5
10 2 60 6
11 2 64 9
and another one like this:
df2 <- data.frame(ID = c(1,2),
Var1 = c(20,67),
Var2 = c(5,3))
ID Var1 Var2
1 1 20 5
2 2 67 3
I want to add a column to the first dataframe with the absolute difference between the columns of the two dataframes for every ID
group. That is 20 (i.e., the Var1
value with ID
equal to 1
in the second dataframe) should be subtracted from every Var1
value with an ID
equal to 1
in the first dataframe and then the absolute value should be given in an added column.
I figured out one solution which is rather cumbersome as I have a much larger dataframe with many variables:
First, I would merge the two dataframes by the ID
column and then add another column calculating the absolute difference:
df3 <- merge(df1,df2,by="ID")
df3$Var1_Diff <- abs(df3$Var1.x - df3$Var1.y)
print(df3)
So that I end up with a ugly dataframe like this:
ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1 1 23 4 20 5 3
2 1 41 5 20 5 21
3 1 32 2 20 5 12
4 1 58 1 20 5 38
5 1 60 7 20 5 40
6 2 12 3 67 3 55
7 2 34 4 67 3 33
8 2 55 5 67 3 12
9 2 49 5 67 3 18
10 2 60 6 67 3 7
11 2 64 9 67 3 3
Can anyone suggest a simpler solution where I just add a column to the first dataframe?
Also, as I have do this process for many Var
colums it would be awesome if the solution could also take that into consideration.
Thank you!
答案1
得分: 1
这应该可以很好地扩展:
library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = "source") %>%
mutate(
across(starts_with("Var"), \(x) abs(x - x[source == "ref"]),
.names = "{.col}_diff"),
.by = ID
) %>%
filter(source == "keep") %>%
select(-source)
# ID Var1 Var2 Var1_diff Var2_diff
# 1 1 23 4 3 1
# 2 1 41 5 21 0
# 3 1 32 2 12 3
# 4 1 58 1 38 4
# 5 1 60 7 40 2
# 6 2 12 3 55 0
# 7 2 34 4 33 1
# 8 2 55 5 12 2
# 9 2 49 5 18 2
# 10 2 60 6 7 3
# 11 2 64 9 3 6
英文:
This should scale up nicely:
library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = "source") |>
mutate(
across(starts_with("Var"), \(x) abs(x - x[source == "ref"]),
.names = "{.col}_diff"),
.by = ID
) |>
filter(source == "keep") |>
select(-source)
# ID Var1 Var2 Var1_diff Var2_diff
# 1 1 23 4 3 1
# 2 1 41 5 21 0
# 3 1 32 2 12 3
# 4 1 58 1 38 4
# 5 1 60 7 40 2
# 6 2 12 3 55 0
# 7 2 34 4 33 1
# 8 2 55 5 12 2
# 9 2 49 5 18 2
# 10 2 60 6 7 3
# 11 2 64 9 3 6
答案2
得分: 1
你可以使用data.table
的连接操作来跳过一些中间步骤:
library(data.table)
setDT(df1)[
setDT(df2),
`:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
on = "ID"
][]
#> ID Var1 Var2 diff1 diff2
#> 1: 1 23 4 3 1
#> 2: 1 41 5 21 0
#> 3: 1 32 2 12 3
#> 4: 1 58 1 38 4
#> 5: 1 60 7 40 2
#> 6: 2 12 3 55 0
#> 7: 2 34 4 33 1
#> 8: 2 55 5 12 2
#> 9: 2 49 5 18 2
#> 10: 2 60 6 7 3
#> 11: 2 64 9 3 6
这段代码使用setDT
将df1
和df2
转换为data.table
对象,然后使用:=
操作符在连接的基础上计算了diff1
和diff2
两列的差值,并将结果存储在原始数据表中。最后,使用[]
运算符将结果返回。
英文:
You can skip some of the intermediates with a data.table
join:
library(data.table)
setDT(df1)[
setDT(df2),
`:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
on = "ID"
][]
#> ID Var1 Var2 diff1 diff2
#> 1: 1 23 4 3 1
#> 2: 1 41 5 21 0
#> 3: 1 32 2 12 3
#> 4: 1 58 1 38 4
#> 5: 1 60 7 40 2
#> 6: 2 12 3 55 0
#> 7: 2 34 4 33 1
#> 8: 2 55 5 12 2
#> 9: 2 49 5 18 2
#> 10: 2 60 6 7 3
#> 11: 2 64 9 3 6
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论