基于同一数据框中的多个列的非数字变量创建新列。

huangapple go评论93阅读模式
英文:

Create new column based on non-numerical variables from several columns in the same dataframe in R

问题

以下是您要翻译的代码部分:

  1. A <- c("K", "K", "K", "S", "S", "S", "NA")
  2. B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
  3. C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
  4. df <- data.frame(A, B, C)
  1. df <- df %>%
  2. mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
  3. TRUE ~ "WT")) %>%
  4. mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
  5. TRUE ~ "WT"))

这部分已经翻译完毕。

英文:

I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.

  1. A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
  2. B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
  3. C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
  4. df &lt;- data.frame(A, B, C)

To generate the K_status and S_status additional columns to df my current code is:

  1. df &lt;- df %&gt;%
  2. mutate(K_status = case_when(all(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% df) ~ &quot;Mut&quot;,
  3. TRUE ~ &quot;WT&quot;)) %&gt;%
  4. mutate(S_status = case_when(all(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% df) ~ &quot;Mut&quot;,
  5. TRUE ~ &quot;WT&quot;))

This code is not working as my intended new df should look like this

  1. A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
  2. B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
  3. C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
  4. K_status &lt;- c(&quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;)
  5. S_status &lt;- c(&quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;)
  6. df &lt;- data.frame(A, B, C, K_status, S_status)

Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.

答案1

得分: 1

以下是翻译好的内容:

使用基本R:
我们可以使用base R,使用rowSums更高效,以创建一个逻辑向量,然后基于它进行赋值。

  1. i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
  2. i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
  3. df$K_status <- "WT"
  4. df$K_status[i1] <- "Mut"
  5. df$S_status <- "WT"
  6. df$S_status[i2] <- "Mut"

-输出

  1. > df
  2. A B C K_status S_status
  3. 1 K NA TT WT WT
  4. 2 K AA YY Mut WT
  5. 3 K AC YY WT WT
  6. 4 S NA TT WT WT
  7. 5 S AA YY WT WT
  8. 6 S AB Y WT Mut
  9. 7 NA LD TT WT WT

或者使用tidyverse:
以一种矢量化的方式进行高效的代码执行,只需创建一个键/值数据集或命名的list,然后在if_all中循环遍历列,在keydat数据集中提取相应的值,使用case_when创建新的列。

  1. library(dplyr)
  2. keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))
  3. df %>%
  4. mutate(K_status = case_when(if_all(everything(),
  5. ~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"),
  6. S_status = case_when(if_all(A:C, ~
  7. .x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))

-输出

  1. A B C K_status S_status
  2. 1 K NA TT WT WT
  3. 2 K AA YY Mut WT
  4. 3 K AC YY WT WT
  5. 4 S NA TT WT WT
  6. 5 S AA YY WT WT
  7. 6 S AB Y WT Mut
  8. 7 NA LD TT WT WT
英文:

We may use base R - would be more efficient with rowSums to create a logical vector and then do the assignment based on it

  1. i1 &lt;- rowSums(df == c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;)[col(df)]) == 3
  2. i2 &lt;- rowSums(df == c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;)[col(df)]) == 3
  3. df$K_status &lt;- &quot;WT&quot;
  4. df$K_status[i1] &lt;- &quot;Mut&quot;
  5. df$S_status &lt;- &quot;WT&quot;
  6. df$S_status[i2] &lt;- &quot;Mut&quot;

-output

  1. &gt; df
  2. A B C K_status S_status
  3. 1 K NA TT WT WT
  4. 2 K AA YY Mut WT
  5. 3 K AC YY WT WT
  6. 4 S NA TT WT WT
  7. 5 S AA YY WT WT
  8. 6 S AB Y WT Mut
  9. 7 NA LD TT WT WT

Or with tidyverse in a vectorized way for efficient execution of code - just create a key/value dataset or a named list, then loop over the columns in if_all, extract the corresponding value from keydat dataset, compare and use case_when to create new columns

  1. library(dplyr)
  2. keydat &lt;- tibble(A = c(&quot;K&quot;, &quot;S&quot;), B = c(&quot;AA&quot;, &quot;AB&quot;), C = c(&quot;YY&quot;, &quot;Y&quot;))
  3. df %&gt;%
  4. mutate(K_status = case_when(if_all(everything(),
  5. ~ .x == keydat[[cur_column()]][1]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;),
  6. S_status = case_when(if_all(A:C, ~
  7. .x == keydat[[cur_column()]][2]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;))

-output

  1. A B C K_status S_status
  2. 1 K NA TT WT WT
  3. 2 K AA YY Mut WT
  4. 3 K AC YY WT WT
  5. 4 S NA TT WT WT
  6. 5 S AA YY WT WT
  7. 6 S AB Y WT Mut
  8. 7 NA LD TT WT WT

答案2

得分: 1

我们可以在纠正一些不一致性后使用类似的代码:

  • 在比较 x %in% z 时,包括 rowwise,其中 z 是逐行使用的。
  • %in% df 替换为对数据框的列 A:C 的正确引用,使用 c_across
  • 使用 all(c(...) %in% x) 代替 all(...) %in% x
  1. df %>%
  2. rowwise() %>%
  3. mutate(K_status = case_when(all(c("K", "AA", "YY") %in% c_across(A:C)) ~ "Mut",
  4. TRUE ~ "WT")) %>%
  5. mutate(S_status = case_when(all(c("S", "AB", "Y") %in% c_across(A:C)) ~ "Mut",
  6. TRUE ~ "WT")) %>%
  7. ungroup()
  8. # 一个 tibble: 7 × 5
  9. A B C K_status S_status
  10. <chr> <chr> <chr> <chr> <chr>
  11. 1 K NA TT WT WT
  12. 2 K AA YY Mut WT
  13. 3 K AC YY WT WT
  14. 4 S NA TT WT WT
  15. 5 S AA YY WT WT
  16. 6 S AB Y WT Mut
  17. 7 NA LD TT WT WT
英文:

We can use similar code ofter correcting several inconsistencies:

-include rowwise as we are comparing x %in% z, in which z is used rowwise

-%in% df should be replaced with a proper reference to the columns A:C of the dataframe, with c_across.

-use all(c(...) %in% x) instead of all(...) %in% x

  1. df %&gt;%
  2. rowwise() %&gt;%
  3. mutate(K_status = case_when(all(c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
  4. TRUE ~ &quot;WT&quot;)) %&gt;%
  5. mutate(S_status = case_when(all(c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
  6. TRUE ~ &quot;WT&quot;)) %&gt;%
  7. ungroup()
  8. # A tibble: 7 &#215; 5
  9. A B C K_status S_status
  10. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  11. 1 K NA TT WT WT
  12. 2 K AA YY Mut WT
  13. 3 K AC YY WT WT
  14. 4 S NA TT WT WT
  15. 5 S AA YY WT WT
  16. 6 S AB Y WT Mut
  17. 7 NA LD TT WT WT
  18. </details>

huangapple
  • 本文由 发表于 2023年2月18日 01:05:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487256.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定