在 mutate 中传递一个变量

huangapple go评论96阅读模式
英文:

Pass a variable in mutate

问题

在这段代码中,您想要处理包含NA值的DataFrame,并尝试使用循环和mutate函数覆盖这些值。但是,您遇到了一个错误。问题出在以下这一行:

  1. df <- df %>% mutate(!!name = ifelse(is.na(!!name, Colonne4 * means_[name], !!name)))

您可以将其修改为以下内容:

  1. df <- df %>% mutate(!!name := ifelse(is.na(!!name), Colonne4 * means_[name], !!name))

这样就能够正确地覆盖DataFrame中的NA值。

英文:

I have a dataframe containing NA values ​​in each columns. I would like to override these values.
I set up a loop that goes through each column and applies the mutate function.
The column name is in a name variable. As I use it in the mutate function ?

  1. df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
  2. means_ &lt;- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
  3. for(name in names(means_)){
  4. df &lt;- df %&gt;% mutate(!!name = ifelse(is.na(!!name, Colonne4 * means_[name], !!name)))
  5. }

Error

  1. Error: unexpected &#39;=&#39; in:
  2. &quot; df &lt;- df %&gt;%
  3. mutate(!!name =&quot;

答案1

得分: 3

这是您提供的代码的翻译部分:

在使用mutate()中的across()时,也许使用across()会更容易-没有必要提前计算均值:

  1. library(dplyr)
  2. set.seed(1234)
  3. df <- data.frame("Colonne1" = c(1, 2, NA, 4, 5, NA), "Colonne2" = c(6, NA, 8, 9, NA, 11), "Colonne3" = c(NA, 13, 14, NA, 16, 17), "Colonne4" = runif(6))
  4. df <- df %>%
  5. mutate(across(Colonne1:Colonne3,
  6. ~ifelse(is.na(.x),
  7. Colonne4*mean(.x, na.rm=TRUE),
  8. .x)))
  9. df
  10. #> Colonne1 Colonne2 Colonne3 Colonne4
  11. #> 1 1.000000 6.000000 1.705551 0.1137034
  12. #> 2 2.000000 5.289545 13.000000 0.6222994
  13. #> 3 1.827824 8.000000 14.000000 0.6092747
  14. #> 4 4.000000 9.000000 9.350692 0.6233794
  15. #> 5 5.000000 7.317781 16.000000 0.8609154
  16. #> 6 1.920932 11.000000 17.000000 0.6403106

如果您想知道如何在循环中实现它,可以像下面这样做。首先要注意的是,在mutate()中将字符串用作变量名时,需要将=更改为:=。您还可以使用!!sym(name)来评估变量名,这将使其在mutate()中像变量一样处理,而不是字符串。

  1. set.seed(1234)
  2. df <- data.frame("Colonne1" = c(1, 2, NA, 4, 5, NA), "Colonne2" = c(6, NA, 8, 9, NA, 11), "Colonne3" = c(NA, 13, 14, NA, 16, 17), "Colonne4" = runif(6))
  3. means_ <- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
  4. for(name in names(means_)){
  5. df <- df %>% mutate({{name}} := ifelse(is.na(!!sym(name)), Colonne4 * means_[!!name], !!sym(name)))
  6. }
  7. df
  8. #> Colonne1 Colonne2 Colonne3 Colonne4
  9. #> 1 1.000000 6.000000 1.705551 0.1137034
  10. #> 2 2.000000 5.289545 13.000000 0.6222994
  11. #> 3 1.827824 8.000000 14.000000 0.6092747
  12. #> 4 4.000000 9.000000 9.350692 0.6233794
  13. #> 5 5.000000 7.317781 16.000000 0.8609154
  14. #> 6 1.920932 11.000000 17.000000 0.6403106

创建于2023-06-15,使用 reprex v2.0.2

英文:

It might be easier to do this with across() in mutate()- there is no reason to calculate the mean ahead of time:

  1. library(dplyr)
  2. set.seed(1234)
  3. df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
  4. df &lt;- df %&gt;%
  5. mutate(across(Colonne1:Colonne3,
  6. ~ifelse(is.na(.x),
  7. Colonne4*mean(.x, na.rm=TRUE),
  8. .x)))
  9. df
  10. #&gt; Colonne1 Colonne2 Colonne3 Colonne4
  11. #&gt; 1 1.000000 6.000000 1.705551 0.1137034
  12. #&gt; 2 2.000000 5.289545 13.000000 0.6222994
  13. #&gt; 3 1.827824 8.000000 14.000000 0.6092747
  14. #&gt; 4 4.000000 9.000000 9.350692 0.6233794
  15. #&gt; 5 5.000000 7.317781 16.000000 0.8609154
  16. #&gt; 6 1.920932 11.000000 17.000000 0.6403106

If you want to know how it would work in the loop, you could do it like below. The first thing to note is that when using a string as a variable name in mutate() you need to change = to :=. You can also evaluate the names with !!sym(name), which will treat it like a variable in mutate() rather than a string.

  1. set.seed(1234)
  2. df &lt;- data.frame(&quot;Colonne1&quot; = c(1, 2, NA, 4, 5, NA), &quot;Colonne2&quot; = c(6, NA, 8, 9, NA, 11), &quot;Colonne3&quot; = c(NA, 13, 14, NA, 16, 17), &quot;Colonne4&quot; = runif(6))
  3. means_ &lt;- colMeans(df, na.rm = TRUE)[-c(ncol(df))]
  4. for(name in names(means_)){
  5. df &lt;- df %&gt;% mutate({{name}} := ifelse(is.na(!!sym(name)), Colonne4 * means_[!!name], !!sym(name)))
  6. }
  7. df
  8. #&gt; Colonne1 Colonne2 Colonne3 Colonne4
  9. #&gt; 1 1.000000 6.000000 1.705551 0.1137034
  10. #&gt; 2 2.000000 5.289545 13.000000 0.6222994
  11. #&gt; 3 1.827824 8.000000 14.000000 0.6092747
  12. #&gt; 4 4.000000 9.000000 9.350692 0.6233794
  13. #&gt; 5 5.000000 7.317781 16.000000 0.8609154
  14. #&gt; 6 1.920932 11.000000 17.000000 0.6403106

<sup>Created on 2023-06-15 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月15日 20:32:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482517.html
匿名

发表评论

匿名网友
#

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定