基于同一数据框中的多个列的非数字变量创建新列。

huangapple go评论62阅读模式
英文:

Create new column based on non-numerical variables from several columns in the same dataframe in R

问题

以下是您要翻译的代码部分:

A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)
df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT"))

这部分已经翻译完毕。

英文:

I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.

A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
df &lt;- data.frame(A, B, C)

To generate the K_status and S_status additional columns to df my current code is:

df &lt;- df %&gt;%
mutate(K_status = case_when(all(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% df) ~ &quot;Mut&quot;,
TRUE ~ &quot;WT&quot;)) %&gt;%
mutate(S_status = case_when(all(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% df) ~ &quot;Mut&quot;,
TRUE ~ &quot;WT&quot;)) 

This code is not working as my intended new df should look like this

A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
K_status &lt;- c(&quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;)
S_status &lt;- c(&quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;)
df &lt;- data.frame(A, B, C, K_status, S_status)

Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.

答案1

得分: 1

以下是翻译好的内容:

使用基本R:
我们可以使用base R,使用rowSums更高效,以创建一个逻辑向量,然后基于它进行赋值。

i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
df$K_status <- "WT"
df$K_status[i1] <- "Mut"
df$S_status <- "WT"
df$S_status[i2] <- "Mut"

-输出

> df
   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

或者使用tidyverse:
以一种矢量化的方式进行高效的代码执行,只需创建一个键/值数据集或命名的list,然后在if_all中循环遍历列,在keydat数据集中提取相应的值,使用case_when创建新的列。

library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))

df %>%
   mutate(K_status = case_when(if_all(everything(),
    ~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"), 
   S_status = case_when(if_all(A:C, ~
    .x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))

-输出

   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT
英文:

We may use base R - would be more efficient with rowSums to create a logical vector and then do the assignment based on it

 i1 &lt;- rowSums(df == c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;)[col(df)]) == 3
 i2 &lt;- rowSums(df == c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;)[col(df)]) == 3
 df$K_status &lt;- &quot;WT&quot;
 df$K_status[i1] &lt;- &quot;Mut&quot;
  df$S_status &lt;- &quot;WT&quot;
 df$S_status[i2] &lt;- &quot;Mut&quot;

-output

&gt; df
   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

Or with tidyverse in a vectorized way for efficient execution of code - just create a key/value dataset or a named list, then loop over the columns in if_all, extract the corresponding value from keydat dataset, compare and use case_when to create new columns

library(dplyr)
keydat &lt;- tibble(A = c(&quot;K&quot;, &quot;S&quot;), B = c(&quot;AA&quot;, &quot;AB&quot;), C = c(&quot;YY&quot;, &quot;Y&quot;))

df %&gt;%
   mutate(K_status = case_when(if_all(everything(),
    ~ .x == keydat[[cur_column()]][1]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;), 
   S_status = case_when(if_all(A:C, ~
    .x == keydat[[cur_column()]][2]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;))

-output

   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

答案2

得分: 1

我们可以在纠正一些不一致性后使用类似的代码:

  • 在比较 x %in% z 时,包括 rowwise,其中 z 是逐行使用的。
  • %in% df 替换为对数据框的列 A:C 的正确引用,使用 c_across
  • 使用 all(c(...) %in% x) 代替 all(...) %in% x
df %>%
    rowwise() %>%
    mutate(K_status = case_when(all(c("K", "AA", "YY") %in% c_across(A:C)) ~ "Mut",
           TRUE ~ "WT")) %>%
    mutate(S_status = case_when(all(c("S", "AB", "Y") %in% c_across(A:C)) ~ "Mut",
           TRUE ~ "WT")) %>%
    ungroup()

# 一个 tibble: 7 × 5
  A     B     C     K_status S_status
  <chr> <chr> <chr> <chr>    <chr>   
1 K     NA    TT    WT       WT      
2 K     AA    YY    Mut      WT      
3 K     AC    YY    WT       WT      
4 S     NA    TT    WT       WT      
5 S     AA    YY    WT       WT      
6 S     AB    Y     WT       Mut     
7 NA    LD    TT    WT       WT
英文:

We can use similar code ofter correcting several inconsistencies:

-include rowwise as we are comparing x %in% z, in which z is used rowwise

-%in% df should be replaced with a proper reference to the columns A:C of the dataframe, with c_across.

-use all(c(...) %in% x) instead of all(...) %in% x

df %&gt;%
    rowwise() %&gt;%
    mutate(K_status = case_when(all(c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
           TRUE ~ &quot;WT&quot;)) %&gt;%
    mutate(S_status = case_when(all(c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
           TRUE ~ &quot;WT&quot;)) %&gt;%
    ungroup()


# A tibble: 7 &#215; 5
  A     B     C     K_status S_status
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;   
1 K     NA    TT    WT       WT      
2 K     AA    YY    Mut      WT      
3 K     AC    YY    WT       WT      
4 S     NA    TT    WT       WT      
5 S     AA    YY    WT       WT      
6 S     AB    Y     WT       Mut     
7 NA    LD    TT    WT       WT  

</details>



huangapple
  • 本文由 发表于 2023年2月18日 01:05:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487256.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定