使用dplyr在列对之间进行突变。

huangapple go评论95阅读模式
英文:

Mutate across pairs of columns with dplyr

问题

我试图在dplyr中对列对应的函数进行应用。在下面的简单示例中,有列ab,以及a_expb_exp。我想创建一个新列叫做a_mul = a*a_exp,以及另一个用于b(以及cde ...)。我知道可以使用循环和函数、purrr,或pivot_longer来解决这个问题,但我想知道是否可以使用mutate(across())来解决它。这是一个简单的示例。实际问题更复杂,所以我无法一次性创建两列,因为在实际情况中,a_exp不是a的函数。我正在尝试以下代码,并且得到以下错误信息:

"promise already under evaluation: recursive default argument reference or earlier problems?"

我不明白为什么或如何修复它。

  1. test <- tibble(a = runif(10), b = runif(10)) %>%
  2. mutate(across(c(a,b), exp, .names = '{.col}_exp'))
  3. test <- test %>%
  4. mutate(across(c(a,b),
  5. ~ .x * .data[[paste0(cur_column(), '_exp')]],
  6. .names = '{.col}_mult'))

我已经翻译了你的代码示例,如你所要求,没有包括其他内容。

英文:

I am trying to apply a function to pairs of columns in dplyr. In the simple example below, there are columns a and b, as well as a_exp and b_exp. I want to create a new column called a_mul = a*a_exp and another one for b (and c, d, e ...). I know I can solve this problem with a loop and a function, with purrr, or with pivot_longer, but I want to know if I can solve it with mutate(across()). This is a minimal example. The real problem is more complicated, so I can't create both columns in one go because in the real case a_exp is not a function of a. I am trying the following code, and I get

>"promise already under evaluation: recursive default argument reference or earlier problems?"

I don't understand why or how to fix it.

  1. test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;%
  2. mutate(across(c(a,b), exp, .names = &#39;{.col}_exp&#39;))
  3. test &lt;- test %&gt;%
  4. mutate(across(c(a,b),
  5. ~ .x * .data[[paste0(cur_column(), &#39;_exp&#39;)]],
  6. .names = &#39;{.col}_mult&#39;))

答案1

得分: 1

你可以从列中获取值。

  1. library(tidyverse)
  2. set.seed(12)
  3. test <- tibble(a = runif(10), b = runif(10)) %>%
  4. mutate(across(c(a, b), exp, .names = "{.col}_exp"))
  5. test %>% mutate(across(c(a, b), ~.x * get(paste0(cur_column(), "_exp")), .names = "{.col}_mult"))

如果你可以决定先进行哪个算术操作,也许先进行乘法,然后再进行exp会更容易。

  1. test <- tibble(a = runif(10), b = runif(10)) %>%
  2. mutate(across(c(a, b), ~.x * exp(.x), .names = "{.col}_mult"),
  3. across(c(a, b), exp, .names = "{.col}_exp"))

输出

  1. # A tibble: 10 × 6
  2. a b a_exp b_exp a_mult b_mult
  3. <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  4. 1 0.0694 0.393 1.07 1.48 0.0743 0.582
  5. 2 0.818 0.814 2.27 2.26 1.85 1.84
  6. 3 0.943 0.376 2.57 1.46 2.42 0.548
  7. 4 0.269 0.381 1.31 1.46 0.353 0.557
  8. 5 0.169 0.265 1.18 1.30 0.201 0.345
  9. 6 0.0339 0.439 1.03 1.55 0.0351 0.682
  10. 7 0.179 0.458 1.20 1.58 0.214 0.723
  11. 8 0.642 0.541 1.90 1.72 1.22 0.929
  12. 9 0.0229 0.666 1.02 1.95 0.0234 1.30
  13. 10 0.00832 0.113 1.01 1.12 0.00839 0.126
英文:

You can get the values from the column.

  1. library(tidyverse)
  2. set.seed(12)
  3. test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;%
  4. mutate(across(c(a, b), exp, .names = &quot;{.col}_exp&quot;))
  5. test %&gt;% mutate(across(c(a, b), ~.x * get(paste0(cur_column(), &quot;_exp&quot;)), .names = &quot;{.col}_mult&quot;))

If you can decide which arithmetic operation to go first, it maybe easier to do multiplication first, then exp.

  1. test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;%
  2. mutate(across(c(a, b), ~.x * exp(.x), .names = &quot;{.col}_mult&quot;),
  3. across(c(a, b), exp, .names = &quot;{.col}_exp&quot;))

Output

  1. # A tibble: 10 &#215; 6
  2. a b a_exp b_exp a_mult b_mult
  3. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  4. 1 0.0694 0.393 1.07 1.48 0.0743 0.582
  5. 2 0.818 0.814 2.27 2.26 1.85 1.84
  6. 3 0.943 0.376 2.57 1.46 2.42 0.548
  7. 4 0.269 0.381 1.31 1.46 0.353 0.557
  8. 5 0.169 0.265 1.18 1.30 0.201 0.345
  9. 6 0.0339 0.439 1.03 1.55 0.0351 0.682
  10. 7 0.179 0.458 1.20 1.58 0.214 0.723
  11. 8 0.642 0.541 1.90 1.72 1.22 0.929
  12. 9 0.0229 0.666 1.02 1.95 0.0234 1.30
  13. 10 0.00832 0.113 1.01 1.12 0.00839 0.126

答案2

得分: 1

以下是翻译好的代码部分:

1. 使用 purrr 包中的 reduce 方法的方法:

  1. library(dplyr)
  2. library(purrr)
  3. library(stringr)
  4. set.seed(123)
  5. test %>%
  6. split.default(str_remove(names(.), "_.*")) %>%
  7. map_dfr(reduce, `*`) %>%
  8. rename_with(~ paste0(., "_mult"), everything()) %>%
  9. bind_cols(test)
  10. a_mult b_mult a b a_exp b_exp
  11. <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  12. 1 1.86 1.14 0.819 0.616 2.27 1.85
  13. 2 2.53 0.861 0.964 0.515 2.62 1.67
  14. 3 2.33 2.31 0.924 0.920 2.52 2.51
  15. 4 0.194 0.718 0.164 0.455 1.18 1.58
  16. 5 0.500 1.98 0.351 0.847 1.42 2.33
  17. 6 1.14 2.08 0.617 0.871 1.85 2.39
  18. 7 1.04 1.35 0.580 0.683 1.79 1.98
  19. 8 2.42 1.71 0.942 0.781 2.56 2.18
  20. 9 0.259 0.323 0.210 0.251 1.23 1.29
  21. 10 0.0404 0.0487 0.0388 0.0465 1.04 1.05

2. 使用 across 方法需要使用 rename_with 修改列名以获得合适的列名对:

  1. library(dplyr)
  2. library(stringr)
  3. test %>%
  4. rename_with(., ~ifelse(!str_detect(., "\\_"), paste0(., "_start"), .)) %>%
  5. mutate(across(ends_with('exp'), ~ . *
  6. get(str_replace(cur_column(), "exp$", "start")), .names = "mult_{.col}")) %>%
  7. rename_at(vars(starts_with('mult')), ~ str_remove(., "\\_exp"))
  8. a_start b_start a_exp b_exp mult_a mult_b
  9. <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  10. 1 0.288 0.957 1.33 2.60 0.383 2.49
  11. 2 0.788 0.453 2.20 1.57 1.73 0.713
  12. 3 0.409 0.678 1.51 1.97 0.616 1.33
  13. 4 0.883 0.573 2.42 1.77 2.14 1.02
  14. 5 0.940 0.103 2.56 1.11 2.41 0.114
  15. 6 0.0456 0.900 1.05 2.46 0.0477 2.21
  16. 7 0.528 0.246 1.70 1.28 0.896 0.315
  17. 8 0.892 0.0421 2.44 1.04 2.18 0.0439
  18. 9 0.551 0.328 1.74 1.39 0.957 0.455
  19. 10 0.457 0.955 1.58 2.60 0.721 2.48
英文:

We have at least two ways to do this:

1.An approach with reduce from purrr package:

  1. library(dplyr)
  2. library(purrr)
  3. library(stringr)
  4. set.seed(123)
  5. test %&gt;%
  6. split.default(str_remove(names(.), &quot;_.*&quot;)) %&gt;%
  7. map_dfr(reduce, `*`) %&gt;%
  8. rename_with(~ paste0(., &quot;_mult&quot;), everything()) %&gt;%
  9. bind_cols(test)
  10. a_mult b_mult a b a_exp b_exp
  11. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  12. 1 1.86 1.14 0.819 0.616 2.27 1.85
  13. 2 2.53 0.861 0.964 0.515 2.62 1.67
  14. 3 2.33 2.31 0.924 0.920 2.52 2.51
  15. 4 0.194 0.718 0.164 0.455 1.18 1.58
  16. 5 0.500 1.98 0.351 0.847 1.42 2.33
  17. 6 1.14 2.08 0.617 0.871 1.85 2.39
  18. 7 1.04 1.35 0.580 0.683 1.79 1.98
  19. 8 2.42 1.71 0.942 0.781 2.56 2.18
  20. 9 0.259 0.323 0.210 0.251 1.23 1.29
  21. 10 0.0404 0.0487 0.0388 0.0465 1.04 1.05

2. For using across we have to modify the names with rename_with to get adequate pairs of names:

  1. library(dplyr)
  2. library(stringr)
  3. test %&gt;%
  4. rename_with(., ~ifelse(!str_detect(., &quot;\\_&quot;), paste0(., &quot;_start&quot;), .)) %&gt;%
  5. mutate(across(ends_with(&#39;_exp&#39;), ~ . *
  6. get(str_replace(cur_column(), &quot;exp$&quot;, &quot;start&quot;)), .names = &quot;mult_{.col}&quot;)) %&gt;%
  7. rename_at(vars(starts_with(&#39;mult&#39;)), ~ str_remove(., &quot;\\_exp&quot;))
  8. a_start b_start a_exp b_exp mult_a mult_b
  9. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  10. 1 0.288 0.957 1.33 2.60 0.383 2.49
  11. 2 0.788 0.453 2.20 1.57 1.73 0.713
  12. 3 0.409 0.678 1.51 1.97 0.616 1.33
  13. 4 0.883 0.573 2.42 1.77 2.14 1.02
  14. 5 0.940 0.103 2.56 1.11 2.41 0.114
  15. 6 0.0456 0.900 1.05 2.46 0.0477 2.21
  16. 7 0.528 0.246 1.70 1.28 0.896 0.315
  17. 8 0.892 0.0421 2.44 1.04 2.18 0.0439
  18. 9 0.551 0.328 1.74 1.39 0.957 0.455
  19. 10 0.457 0.955 1.58 2.60 0.721 2.48

答案3

得分: 1

You can multiply across(a:b) by across(a_exp:b_exp) directly if they are in pairs.

  1. test %>%
  2. mutate(across(a:b, .names = '{.col}_mul') * across(a_exp:b_exp))

# A tibble: 10 × 6

a b a_exp b_exp a_mul b_mul

1 0.288 0.957 1.33 2.60 0.383 2.49

2 0.788 0.453 2.20 1.57 1.73 0.713

3 0.409 0.678 1.51 1.97 0.616 1.33

4 0.883 0.573 2.42 1.77 2.14 1.02

5 0.940 0.103 2.56 1.11 2.41 0.114

6 0.0456 0.900 1.05 2.46 0.0477 2.21

7 0.528 0.246 1.70 1.28 0.896 0.315

8 0.892 0.0421 2.44 1.04 2.18 0.0439

9 0.551 0.328 1.74 1.39 0.957 0.455

10 0.457 0.955 1.58 2.60 0.721 2.48

  1. ---
  2. ##### Data
  3. ```r
  4. library(dplyr)
  5. set seed(123)
  6. test <- tibble(a = runif(10), b = runif(10)) %>%
  7. mutate(across(c(a,b), exp, .names = '{.col}_exp'))
英文:

You can multiply across(a:b) by across(a_exp:b_exp) directly if they are in pairs.

  1. test %&gt;%
  2. mutate(across(a:b, .names = &#39;{.col}_mul&#39;) * across(a_exp:b_exp))
  3. # # A tibble: 10 &#215; 6
  4. # a b a_exp b_exp a_mul b_mul
  5. # &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  6. # 1 0.288 0.957 1.33 2.60 0.383 2.49
  7. # 2 0.788 0.453 2.20 1.57 1.73 0.713
  8. # 3 0.409 0.678 1.51 1.97 0.616 1.33
  9. # 4 0.883 0.573 2.42 1.77 2.14 1.02
  10. # 5 0.940 0.103 2.56 1.11 2.41 0.114
  11. # 6 0.0456 0.900 1.05 2.46 0.0477 2.21
  12. # 7 0.528 0.246 1.70 1.28 0.896 0.315
  13. # 8 0.892 0.0421 2.44 1.04 2.18 0.0439
  14. # 9 0.551 0.328 1.74 1.39 0.957 0.455
  15. # 10 0.457 0.955 1.58 2.60 0.721 2.48

Data
  1. library(dplyr)
  2. set.seed(123)
  3. test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;%
  4. mutate(across(c(a,b), exp, .names = &#39;{.col}_exp&#39;))

答案4

得分: 0

test[, 1:2] * test[, 3:4]

英文:

If the data is organised as shown in your example data, this should do:

  1. test[, 1:2] * test[, 3:4]

huangapple
  • 本文由 发表于 2023年3月8日 15:12:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定