2023年8月9日 02:28:27go评论140阅读模式

英文:

Calculate the difference between all values in a group and a reference value for the group in R

问题

我有一个像这样的数据框：

df1 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
                  Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
                  Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
   ID Var1 Var2
1   1   23    4
2   1   41    5
3   1   32    2
4   1   58    1
5   1   60    7
6   2   12    3
7   2   34    4
8   2   55    5
9   2   49    5
10  2   60    6
11  2   64    9

还有另一个数据框：

df2 <- data.frame(ID = c(1,2),
                  Var1 = c(20,67),
                  Var2 = c(5,3))
  ID Var1 Var2
1  1   20    5
2  2   67    3

我想在第一个数据框中添加一列，该列为每个ID组的两个数据框的列之间的绝对差异。也就是说，应该从第一个数据框中ID等于1的每个Var1值中减去20（即第二个数据框中ID等于1的Var1值），然后将绝对值放入添加的列中。

我找到了一种解决方案，但它相当繁琐，因为我有一个更大的数据框，其中有许多变量：
首先，我将两个数据框按ID列合并，然后添加另一列计算绝对差异：

df3 <- merge(df1,df2,by="ID")
df3$Var1_Diff <- abs(df3$Var1.x - df3$Var1.y)
print(df3)

这样，我得到了一个像这样的丑陋的数据框：

       ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1   1     23      4     20      5         3
2   1     41      5     20      5        21
3   1     32      2     20      5        12
4   1     58      1     20      5        38
5   1     60      7     20      5        40
6   2     12      3     67      3        55
7   2     34      4     67      3        33
8   2     55      5     67      3        12
9   2     49      5     67      3        18
10  2     60      6     67      3         7
11  2     64      9     67      3         3

有人能提供一个更简单的解决方案吗？在第一个数据框中只需添加一列即可。

另外，由于我需要对许多Var列进行此过程，如果解决方案也能考虑到这一点，那将非常棒。

谢谢！

英文:

I have a dataframe like this:

df1 &lt;- data.frame(ID = c(1,1,1,1,1,2,2,2,2,2,2),
                  Var1 = c(23, 41, 32, 58, 60,12,34,55,49,60,64),
                  Var2 = c(4,5,2,1,7,3,4,5,5,6,9))
   ID Var1 Var2
1   1   23    4
2   1   41    5
3   1   32    2
4   1   58    1
5   1   60    7
6   2   12    3
7   2   34    4
8   2   55    5
9   2   49    5
10  2   60    6
11  2   64    9

and another one like this:

df2 &lt;- data.frame(ID = c(1,2),
                  Var1 = c(20,67),
                  Var2 = c(5,3))
  ID Var1 Var2
1  1   20    5
2  2   67    3

I want to add a column to the first dataframe with the absolute difference between the columns of the two dataframes for every ID group. That is 20 (i.e., the Var1 value with ID equal to 1 in the second dataframe) should be subtracted from every Var1 value with an ID equal to 1 in the first dataframe and then the absolute value should be given in an added column.

I figured out one solution which is rather cumbersome as I have a much larger dataframe with many variables:
First, I would merge the two dataframes by the ID column and then add another column calculating the absolute difference:

df3 &lt;- merge(df1,df2,by=&quot;ID&quot;)
df3$Var1_Diff &lt;- abs(df3$Var1.x - df3$Var1.y)
print(df3)

So that I end up with a ugly dataframe like this:

       ID Var1.x Var2.x Var1.y Var2.y Var1_Diff
1   1     23      4     20      5         3
2   1     41      5     20      5        21
3   1     32      2     20      5        12
4   1     58      1     20      5        38
5   1     60      7     20      5        40
6   2     12      3     67      3        55
7   2     34      4     67      3        33
8   2     55      5     67      3        12
9   2     49      5     67      3        18
10  2     60      6     67      3         7
11  2     64      9     67      3         3

Can anyone suggest a simpler solution where I just add a column to the first dataframe?

Also, as I have do this process for many Var colums it would be awesome if the solution could also take that into consideration.

Thank you!

答案1

得分: 1

这应该可以很好地扩展：

library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = "source") %>%
  mutate(
    across(starts_with("Var"), \(x) abs(x - x[source == "ref"]),
           .names = "{.col}_diff"),
    .by = ID
  ) %>%
  filter(source == "keep") %>%
  select(-source)
#    ID Var1 Var2 Var1_diff Var2_diff
# 1   1   23    4         3         1
# 2   1   41    5        21         0
# 3   1   32    2        12         3
# 4   1   58    1        38         4
# 5   1   60    7        40         2
# 6   2   12    3        55         0
# 7   2   34    4        33         1
# 8   2   55    5        12         2
# 9   2   49    5        18         2
# 10  2   60    6         7         3
# 11  2   64    9         3         6

英文:

This should scale up nicely:

library(dplyr)
bind_rows(list(keep = df1, ref = df2), .id = &quot;source&quot;) |&gt;
  mutate(
    across(starts_with(&quot;Var&quot;), \(x) abs(x - x[source == &quot;ref&quot;]),
           .names = &quot;{.col}_diff&quot;),
    .by = ID
  ) |&gt;
  filter(source == &quot;keep&quot;) |&gt;
  select(-source)
#    ID Var1 Var2 Var1_diff Var2_diff
# 1   1   23    4         3         1
# 2   1   41    5        21         0
# 3   1   32    2        12         3
# 4   1   58    1        38         4
# 5   1   60    7        40         2
# 6   2   12    3        55         0
# 7   2   34    4        33         1
# 8   2   55    5        12         2
# 9   2   49    5        18         2
# 10  2   60    6         7         3
# 11  2   64    9         3         6

答案2

得分: 1

你可以使用data.table的连接操作来跳过一些中间步骤：

library(data.table)
setDT(df1)[
  setDT(df2),
  `:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
  on = "ID"
][]
#>     ID Var1 Var2 diff1 diff2
#>  1:  1   23    4     3     1
#>  2:  1   41    5    21     0
#>  3:  1   32    2    12     3
#>  4:  1   58    1    38     4
#>  5:  1   60    7    40     2
#>  6:  2   12    3    55     0
#>  7:  2   34    4    33     1
#>  8:  2   55    5    12     2
#>  9:  2   49    5    18     2
#> 10:  2   60    6     7     3
#> 11:  2   64    9     3     6

这段代码使用setDT将df1和df2转换为data.table对象，然后使用:=操作符在连接的基础上计算了diff1和diff2两列的差值，并将结果存储在原始数据表中。最后，使用[]运算符将结果返回。

英文:

You can skip some of the intermediates with a data.table join:

library(data.table)
setDT(df1)[
  setDT(df2),
  `:=`(diff1 = abs(i.Var1 - Var1), diff2 = abs(i.Var2 - Var2)),
  on = &quot;ID&quot;
][]
#&gt;     ID Var1 Var2 diff1 diff2
#&gt;  1:  1   23    4     3     1
#&gt;  2:  1   41    5    21     0
#&gt;  3:  1   32    2    12     3
#&gt;  4:  1   58    1    38     4
#&gt;  5:  1   60    7    40     2
#&gt;  6:  2   12    3    55     0
#&gt;  7:  2   34    4    33     1
#&gt;  8:  2   55    5    12     2
#&gt;  9:  2   49    5    18     2
#&gt; 10:  2   60    6     7     3
#&gt; 11:  2   64    9     3     6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中计算一组值与该组的参考值之间的差异。

问题

答案1

答案2

按条件分组 Python 保留所有行。

如何根据一组列的组合作为主键，从另一个CSV文件更新CSV文件？

joint_tests无法通过make.tran()识别asin.sqrt转换。

匹配 pandas 数据框中的值，并用主表中匹配的值替换。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。