使用dplyr在列对之间进行突变。

huangapple go评论68阅读模式
英文:

Mutate across pairs of columns with dplyr

问题

我试图在dplyr中对列对应的函数进行应用。在下面的简单示例中,有列ab,以及a_expb_exp。我想创建一个新列叫做a_mul = a*a_exp,以及另一个用于b(以及cde ...)。我知道可以使用循环和函数、purrr,或pivot_longer来解决这个问题,但我想知道是否可以使用mutate(across())来解决它。这是一个简单的示例。实际问题更复杂,所以我无法一次性创建两列,因为在实际情况中,a_exp不是a的函数。我正在尝试以下代码,并且得到以下错误信息:

"promise already under evaluation: recursive default argument reference or earlier problems?"

我不明白为什么或如何修复它。

test <- tibble(a = runif(10), b = runif(10)) %>%
    mutate(across(c(a,b), exp, .names = '{.col}_exp')) 

test <- test %>%
    mutate(across(c(a,b), 
                  ~ .x * .data[[paste0(cur_column(), '_exp')]], 
                  .names = '{.col}_mult'))

我已经翻译了你的代码示例,如你所要求,没有包括其他内容。

英文:

I am trying to apply a function to pairs of columns in dplyr. In the simple example below, there are columns a and b, as well as a_exp and b_exp. I want to create a new column called a_mul = a*a_exp and another one for b (and c, d, e ...). I know I can solve this problem with a loop and a function, with purrr, or with pivot_longer, but I want to know if I can solve it with mutate(across()). This is a minimal example. The real problem is more complicated, so I can't create both columns in one go because in the real case a_exp is not a function of a. I am trying the following code, and I get

>"promise already under evaluation: recursive default argument reference or earlier problems?"

I don't understand why or how to fix it.

test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;% 
    mutate(across(c(a,b), exp, .names = &#39;{.col}_exp&#39;)) 

test &lt;- test %&gt;% 
    mutate(across(c(a,b), 
                  ~ .x * .data[[paste0(cur_column(), &#39;_exp&#39;)]], 
                  .names = &#39;{.col}_mult&#39;))

答案1

得分: 1

你可以从列中获取值。

library(tidyverse)

set.seed(12)

test <- tibble(a = runif(10), b = runif(10)) %>% 
  mutate(across(c(a, b), exp, .names = "{.col}_exp"))

test %>% mutate(across(c(a, b), ~.x * get(paste0(cur_column(), "_exp")), .names = "{.col}_mult"))

如果你可以决定先进行哪个算术操作,也许先进行乘法,然后再进行exp会更容易。

test <- tibble(a = runif(10), b = runif(10)) %>% 
  mutate(across(c(a, b), ~.x * exp(.x), .names = "{.col}_mult"),
         across(c(a, b), exp, .names = "{.col}_exp"))

输出

# A tibble: 10 × 6
         a     b a_exp b_exp  a_mult b_mult
     <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl>
 1 0.0694  0.393  1.07  1.48 0.0743   0.582
 2 0.818   0.814  2.27  2.26 1.85     1.84 
 3 0.943   0.376  2.57  1.46 2.42     0.548
 4 0.269   0.381  1.31  1.46 0.353    0.557
 5 0.169   0.265  1.18  1.30 0.201    0.345
 6 0.0339  0.439  1.03  1.55 0.0351   0.682
 7 0.179   0.458  1.20  1.58 0.214    0.723
 8 0.642   0.541  1.90  1.72 1.22     0.929
 9 0.0229  0.666  1.02  1.95 0.0234   1.30 
10 0.00832 0.113  1.01  1.12 0.00839  0.126
英文:

You can get the values from the column.

library(tidyverse)

set.seed(12)

test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;% 
  mutate(across(c(a, b), exp, .names = &quot;{.col}_exp&quot;))

test %&gt;% mutate(across(c(a, b), ~.x * get(paste0(cur_column(), &quot;_exp&quot;)), .names = &quot;{.col}_mult&quot;))

If you can decide which arithmetic operation to go first, it maybe easier to do multiplication first, then exp.

test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;% 
  mutate(across(c(a, b), ~.x * exp(.x), .names = &quot;{.col}_mult&quot;),
         across(c(a, b), exp, .names = &quot;{.col}_exp&quot;))

Output

# A tibble: 10 &#215; 6
         a     b a_exp b_exp  a_mult b_mult
     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;  &lt;dbl&gt;
 1 0.0694  0.393  1.07  1.48 0.0743   0.582
 2 0.818   0.814  2.27  2.26 1.85     1.84 
 3 0.943   0.376  2.57  1.46 2.42     0.548
 4 0.269   0.381  1.31  1.46 0.353    0.557
 5 0.169   0.265  1.18  1.30 0.201    0.345
 6 0.0339  0.439  1.03  1.55 0.0351   0.682
 7 0.179   0.458  1.20  1.58 0.214    0.723
 8 0.642   0.541  1.90  1.72 1.22     0.929
 9 0.0229  0.666  1.02  1.95 0.0234   1.30 
10 0.00832 0.113  1.01  1.12 0.00839  0.126

答案2

得分: 1

以下是翻译好的代码部分:

1. 使用 purrr 包中的 reduce 方法的方法:

library(dplyr)
library(purrr)
library(stringr)

set.seed(123)

test %>%
  split.default(str_remove(names(.), "_.*")) %>%
  map_dfr(reduce, `*`) %>%
  rename_with(~ paste0(., "_mult"), everything()) %>%
  bind_cols(test)

   a_mult b_mult      a      b a_exp b_exp
    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>
 1 1.86   1.14   0.819  0.616   2.27  1.85
 2 2.53   0.861  0.964  0.515   2.62  1.67
 3 2.33   2.31   0.924  0.920   2.52  2.51
 4 0.194  0.718  0.164  0.455   1.18  1.58
 5 0.500  1.98   0.351  0.847   1.42  2.33
 6 1.14   2.08   0.617  0.871   1.85  2.39
 7 1.04   1.35   0.580  0.683   1.79  1.98
 8 2.42   1.71   0.942  0.781   2.56  2.18
 9 0.259  0.323  0.210  0.251   1.23  1.29
10 0.0404 0.0487 0.0388 0.0465  1.04  1.05

2. 使用 across 方法需要使用 rename_with 修改列名以获得合适的列名对:

library(dplyr)
library(stringr)

test %>%
    rename_with(., ~ifelse(!str_detect(., "\\_"), paste0(., "_start"), .)) %>% 
    mutate(across(ends_with('exp'), ~ . *
                    get(str_replace(cur_column(), "exp$", "start")), .names = "mult_{.col}")) %>%
    rename_at(vars(starts_with('mult')), ~ str_remove(., "\\_exp"))

  a_start b_start a_exp b_exp mult_a mult_b
     <dbl>   <dbl> <dbl> <dbl>  <dbl>  <dbl>
 1  0.288   0.957   1.33  2.60 0.383  2.49  
 2  0.788   0.453   2.20  1.57 1.73   0.713 
 3  0.409   0.678   1.51  1.97 0.616  1.33  
 4  0.883   0.573   2.42  1.77 2.14   1.02  
 5  0.940   0.103   2.56  1.11 2.41   0.114 
 6  0.0456  0.900   1.05  2.46 0.0477 2.21  
 7  0.528   0.246   1.70  1.28 0.896  0.315 
 8  0.892   0.0421  2.44  1.04 2.18   0.0439
 9  0.551   0.328   1.74  1.39 0.957  0.455 
10  0.457   0.955   1.58  2.60 0.721  2.48 
英文:

We have at least two ways to do this:

1.An approach with reduce from purrr package:

library(dplyr)
library(purrr)
library(stringr)

set.seed(123)

test %&gt;% 
  split.default(str_remove(names(.), &quot;_.*&quot;)) %&gt;% 
  map_dfr(reduce, `*`) %&gt;% 
  rename_with(~ paste0(., &quot;_mult&quot;), everything()) %&gt;% 
  bind_cols(test)

   a_mult b_mult      a      b a_exp b_exp
    &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1 1.86   1.14   0.819  0.616   2.27  1.85
 2 2.53   0.861  0.964  0.515   2.62  1.67
 3 2.33   2.31   0.924  0.920   2.52  2.51
 4 0.194  0.718  0.164  0.455   1.18  1.58
 5 0.500  1.98   0.351  0.847   1.42  2.33
 6 1.14   2.08   0.617  0.871   1.85  2.39
 7 1.04   1.35   0.580  0.683   1.79  1.98
 8 2.42   1.71   0.942  0.781   2.56  2.18
 9 0.259  0.323  0.210  0.251   1.23  1.29
10 0.0404 0.0487 0.0388 0.0465  1.04  1.05

2. For using across we have to modify the names with rename_with to get adequate pairs of names:

library(dplyr)
library(stringr)

test %&gt;%
    rename_with(., ~ifelse(!str_detect(., &quot;\\_&quot;), paste0(., &quot;_start&quot;), .)) %&gt;% 
    mutate(across(ends_with(&#39;_exp&#39;), ~ . *
                    get(str_replace(cur_column(), &quot;exp$&quot;, &quot;start&quot;)), .names = &quot;mult_{.col}&quot;)) %&gt;%
    rename_at(vars(starts_with(&#39;mult&#39;)), ~ str_remove(., &quot;\\_exp&quot;))

  a_start b_start a_exp b_exp mult_a mult_b
     &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
 1  0.288   0.957   1.33  2.60 0.383  2.49  
 2  0.788   0.453   2.20  1.57 1.73   0.713 
 3  0.409   0.678   1.51  1.97 0.616  1.33  
 4  0.883   0.573   2.42  1.77 2.14   1.02  
 5  0.940   0.103   2.56  1.11 2.41   0.114 
 6  0.0456  0.900   1.05  2.46 0.0477 2.21  
 7  0.528   0.246   1.70  1.28 0.896  0.315 
 8  0.892   0.0421  2.44  1.04 2.18   0.0439
 9  0.551   0.328   1.74  1.39 0.957  0.455 
10  0.457   0.955   1.58  2.60 0.721  2.48 

答案3

得分: 1

You can multiply across(a:b) by across(a_exp:b_exp) directly if they are in pairs.

test %>% 
  mutate(across(a:b, .names = '{.col}_mul') * across(a_exp:b_exp))

# A tibble: 10 × 6

a b a_exp b_exp a_mul b_mul

1 0.288 0.957 1.33 2.60 0.383 2.49

2 0.788 0.453 2.20 1.57 1.73 0.713

3 0.409 0.678 1.51 1.97 0.616 1.33

4 0.883 0.573 2.42 1.77 2.14 1.02

5 0.940 0.103 2.56 1.11 2.41 0.114

6 0.0456 0.900 1.05 2.46 0.0477 2.21

7 0.528 0.246 1.70 1.28 0.896 0.315

8 0.892 0.0421 2.44 1.04 2.18 0.0439

9 0.551 0.328 1.74 1.39 0.957 0.455

10 0.457 0.955 1.58 2.60 0.721 2.48


---

##### Data

```r
library(dplyr)

set seed(123)

test <- tibble(a = runif(10), b = runif(10)) %>%
    mutate(across(c(a,b), exp, .names = '{.col}_exp'))
英文:

You can multiply across(a:b) by across(a_exp:b_exp) directly if they are in pairs.

test %&gt;%
  mutate(across(a:b, .names = &#39;{.col}_mul&#39;) * across(a_exp:b_exp))

# # A tibble: 10 &#215; 6
#         a      b a_exp b_exp  a_mul  b_mul
#     &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
#  1 0.288  0.957   1.33  2.60 0.383  2.49
#  2 0.788  0.453   2.20  1.57 1.73   0.713
#  3 0.409  0.678   1.51  1.97 0.616  1.33
#  4 0.883  0.573   2.42  1.77 2.14   1.02  
#  5 0.940  0.103   2.56  1.11 2.41   0.114
#  6 0.0456 0.900   1.05  2.46 0.0477 2.21
#  7 0.528  0.246   1.70  1.28 0.896  0.315
#  8 0.892  0.0421  2.44  1.04 2.18   0.0439
#  9 0.551  0.328   1.74  1.39 0.957  0.455
# 10 0.457  0.955   1.58  2.60 0.721  2.48

Data
library(dplyr)

set.seed(123)

test &lt;- tibble(a = runif(10), b = runif(10)) %&gt;% 
    mutate(across(c(a,b), exp, .names = &#39;{.col}_exp&#39;)) 

答案4

得分: 0

test[, 1:2] * test[, 3:4]

英文:

If the data is organised as shown in your example data, this should do:

test[, 1:2] * test[, 3:4]

huangapple
  • 本文由 发表于 2023年3月8日 15:12:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定