在 mutate 中传递一个变量

huangapple go评论62阅读模式
英文:

Pass a variable in mutate

问题

在这段代码中,您想要处理包含NA值的DataFrame,并尝试使用循环和mutate函数覆盖这些值。但是,您遇到了一个错误。问题出在以下这一行:

df <- df %>% mutate(!!name = ifelse(is.na(!!name, Colonne4 * means_[name], !!name)))

您可以将其修改为以下内容:

df <- df %>% mutate(!!name := ifelse(is.na(!!name), Colonne4 * means_[name], !!name))

这样就能够正确地覆盖DataFrame中的NA值。

英文:

I have a dataframe containing NA values ​​in each columns. I would like to override these values.
I set up a loop that goes through each column and applies the mutate function.
The column name is in a name variable. As I use it in the mutate function ?

df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
means_ &lt;- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
for(name in names(means_)){
  df &lt;- df %&gt;% mutate(!!name = ifelse(is.na(!!name, Colonne4 * means_[name], !!name)))
}

Error

Error: unexpected &#39;=&#39; in:
&quot;  df &lt;- df %&gt;%
    mutate(!!name =&quot;

答案1

得分: 3

这是您提供的代码的翻译部分:

在使用mutate()中的across()时,也许使用across()会更容易-没有必要提前计算均值:

library(dplyr)
  
set.seed(1234)
df <- data.frame("Colonne1" = c(1, 2, NA, 4, 5, NA), "Colonne2" = c(6, NA, 8, 9, NA, 11), "Colonne3" = c(NA, 13, 14, NA, 16, 17), "Colonne4" = runif(6))
df <- df %>%
  mutate(across(Colonne1:Colonne3, 
                ~ifelse(is.na(.x), 
                        Colonne4*mean(.x, na.rm=TRUE), 
                        .x)))
df
#>   Colonne1  Colonne2  Colonne3  Colonne4
#> 1 1.000000  6.000000  1.705551 0.1137034
#> 2 2.000000  5.289545 13.000000 0.6222994
#> 3 1.827824  8.000000 14.000000 0.6092747
#> 4 4.000000  9.000000  9.350692 0.6233794
#> 5 5.000000  7.317781 16.000000 0.8609154
#> 6 1.920932 11.000000 17.000000 0.6403106

如果您想知道如何在循环中实现它,可以像下面这样做。首先要注意的是,在mutate()中将字符串用作变量名时,需要将=更改为:=。您还可以使用!!sym(name)来评估变量名,这将使其在mutate()中像变量一样处理,而不是字符串。

set.seed(1234)
df <- data.frame("Colonne1" = c(1, 2, NA, 4, 5, NA), "Colonne2" = c(6, NA, 8, 9, NA, 11), "Colonne3" = c(NA, 13, 14, NA, 16, 17), "Colonne4" = runif(6))
means_ <- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
for(name in names(means_)){
  df <- df %>% mutate({{name}} := ifelse(is.na(!!sym(name)), Colonne4 * means_[!!name], !!sym(name)))
}
df
#>   Colonne1  Colonne2  Colonne3  Colonne4
#> 1 1.000000  6.000000  1.705551 0.1137034
#> 2 2.000000  5.289545 13.000000 0.6222994
#> 3 1.827824  8.000000 14.000000 0.6092747
#> 4 4.000000  9.000000  9.350692 0.6233794
#> 5 5.000000  7.317781 16.000000 0.8609154
#> 6 1.920932 11.000000 17.000000 0.6403106

创建于2023-06-15,使用 reprex v2.0.2

英文:

It might be easier to do this with across() in mutate()- there is no reason to calculate the mean ahead of time:

library(dplyr)
  
set.seed(1234)
df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
df &lt;- df %&gt;% 
  mutate(across(Colonne1:Colonne3, 
                ~ifelse(is.na(.x), 
                        Colonne4*mean(.x, na.rm=TRUE), 
                        .x)))
df
#&gt;   Colonne1  Colonne2  Colonne3  Colonne4
#&gt; 1 1.000000  6.000000  1.705551 0.1137034
#&gt; 2 2.000000  5.289545 13.000000 0.6222994
#&gt; 3 1.827824  8.000000 14.000000 0.6092747
#&gt; 4 4.000000  9.000000  9.350692 0.6233794
#&gt; 5 5.000000  7.317781 16.000000 0.8609154
#&gt; 6 1.920932 11.000000 17.000000 0.6403106

If you want to know how it would work in the loop, you could do it like below. The first thing to note is that when using a string as a variable name in mutate() you need to change = to :=. You can also evaluate the names with !!sym(name), which will treat it like a variable in mutate() rather than a string.

set.seed(1234)
df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
means_ &lt;- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
for(name in names(means_)){
  df &lt;- df %&gt;% mutate({{name}} := ifelse(is.na(!!sym(name)), Colonne4 * means_[!!name], !!sym(name)))
}
df
#&gt;   Colonne1  Colonne2  Colonne3  Colonne4
#&gt; 1 1.000000  6.000000  1.705551 0.1137034
#&gt; 2 2.000000  5.289545 13.000000 0.6222994
#&gt; 3 1.827824  8.000000 14.000000 0.6092747
#&gt; 4 4.000000  9.000000  9.350692 0.6233794
#&gt; 5 5.000000  7.317781 16.000000 0.8609154
#&gt; 6 1.920932 11.000000 17.000000 0.6403106

<sup>Created on 2023-06-15 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月15日 20:32:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482517.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定