在数据框上根据变量的数值应用条件函数到多个列。

huangapple go评论103阅读模式
英文:

Applying conditional functions on multiple columns of a dataframe, based on values of a variable

问题

以下是代码的翻译部分:

  1. 我有一个包含一个标识我的分组的因子变量(这里是 `y`)和多个数值变量的数据框(为了简化,这里只显示两个 `x` `z`):
  2. ```R
  3. df = tribble(
  4. ~x, ~y, ~z,
  5. 1, "a", 5,
  6. 2, "b", 6,
  7. 3, "a", 7,
  8. 4, "b", 8,
  9. 5, "a", 9,
  10. 6, "b", 10
  11. )

我想要在我的数据框中添加新列,其中我根据因子变量(y)的值对这些数值变量(x 和 z)应用不同的数学函数。对于上面的示例数据框:所有具有 y == "a" 的观察值将添加1,而具有 y == "b" 的观察值将添加2。

这是我的代码:

  1. df %>% mutate(x_new = case_when(grepl("a", y) ~ x + 1,
  2. grepl("b", y) ~ x + 2))

输出

A tibble: 6 × 4

  1. x y z x_new


1 1 a 5 2
2 2 b 6 4
3 3 a 7 4
4 4 b 8 6
5 5 a 9 6
6 6 b 10 8

  1. 我的代码可以用于添加一个变量,但我希望对所有数值变量应用相同的函数,因此在示例中,我还希望对“z”变量应用这些函数,并将值存储在另一个新列中。由于我有许多数值列,我不想手动使用上面的方法逐个进行变异。有关如何实现这一点的建议吗?(尤其是 tidyverse 解决方案,但任何帮助都非常感激。)
  2. 如果您需要更多帮助,请告诉我。
  3. <details>
  4. <summary>英文:</summary>
  5. I have a dataframe with a factor variable identifying my groups (here `y`), and multiple numerical variables (to simplify, here I only show two `x` and `z`):

df = tribble(
~x, ~y, ~z,
1, "a", 5,
2, "b", 6,
3, "a", 7,
4, "b", 8,
5, "a", 9,
6, "b", 10
)

  1. I want to add new columns to my dataframe in which I apply different mathematical functions on those numerical variables (x and z), based on the values of the factor variable (y). For the example dataframe above: all observations with `y == &quot;a&quot;` are added with 1, and the ones with `y == &quot;b&quot;` are added with 2.
  2. This is my code to do it:

df %>% mutate(x_new = case_when(grepl("a", y) ~ x + 1,
grepl("b", y) ~ x + 2))

Output

A tibble: 6 × 4

  1. x y z x_new

<dbl> <chr> <dbl> <dbl>
1 1 a 5 2
2 2 b 6 4
3 3 a 7 4
4 4 b 8 6
5 5 a 9 6
6 6 b 10 8

  1. My code works OK for adding one variable, but I want to apply the same functions for ALL the numerical variables, so in the example I want to apply the functions to the &quot;z&quot; variable as well and store the values in another new column. Since I have many numerical columns I don&#39;t want to manually mutate them one by one with the approach above. Any advice on how to do this? (specially tidyverse solutions but any help is very appreciated)
  2. </details>
  3. # 答案1
  4. **得分**: 2
  5. ```R
  6. # 进行匹配以将a、b与1、2匹配。然后将其添加到df,排除y列。
  7. ll <- setNames(c(1, 2), c("a", "b"))
  8. x <- df[, -2] + ll[df$y]
  9. colnames(x) <- paste0(colnames(x), "_new")
  10. # 将x列与原始数据框合并
  11. cbind(df, x)
  12. # x y z x_new z_new
  13. # 1 2 a 6 3 7
  14. # 2 4 b 8 6 10
  15. # 3 4 a 8 5 9
  16. # 4 6 b 10 8 12
  17. # 5 6 a 10 7 11
  18. # 6 8 b 12 10 14
英文:

Make a lookup to match a, b with 1,2. Then add to df excluding y column. Finally, suffix with &quot;_new&quot; and column bind back to original dataframe:

  1. ll &lt;- setNames(c(1, 2), c(&quot;a&quot;, &quot;b&quot;))
  2. x &lt;- df[, -2 ] + ll[ df$y ]
  3. colnames(x) &lt;- paste0(colnames(x), &quot;_new&quot;)
  4. cbind(df, x)
  5. # x y z x_new z_new
  6. # 1 2 a 6 3 7
  7. # 2 4 b 8 6 10
  8. # 3 4 a 8 5 9
  9. # 4 6 b 10 8 12
  10. # 5 6 a 10 7 11
  11. # 6 8 b 12 10 14

huangapple
  • 本文由 发表于 2023年3月8日 16:05:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670587.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定