2023年2月18日 01:05:20go评论93阅读模式

英文:

Create new column based on non-numerical variables from several columns in the same dataframe in R

问题

以下是您要翻译的代码部分：

A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)

df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT"))

这部分已经翻译完毕。

英文:

I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.

A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
df &lt;- data.frame(A, B, C)

To generate the K_status and S_status additional columns to df my current code is:

df &lt;- df %&gt;%
mutate(K_status = case_when(all(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% df) ~ &quot;Mut&quot;,
TRUE ~ &quot;WT&quot;)) %&gt;%
mutate(S_status = case_when(all(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% df) ~ &quot;Mut&quot;,
TRUE ~ &quot;WT&quot;))

This code is not working as my intended new df should look like this

A &lt;- c(&quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;S&quot;, &quot;S&quot;, &quot;S&quot;, &quot;NA&quot;)
B &lt;- c(&quot;NA&quot;, &quot;AA&quot;, &quot;AC&quot;, &quot;NA&quot;, &quot;AA&quot;, &quot;AB&quot;, &quot;LD&quot;)
C &lt;- c(&quot;TT&quot;, &quot;YY&quot;, &quot;YY&quot;, &quot;TT&quot;, &quot;YY&quot;, &quot;Y&quot;, &quot;TT&quot;)
K_status &lt;- c(&quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;)
S_status &lt;- c(&quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;WT&quot;, &quot;Mut&quot;, &quot;WT&quot;)
df &lt;- data.frame(A, B, C, K_status, S_status)

Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.

答案1

得分: 1

以下是翻译好的内容：

使用基本R:
我们可以使用base R，使用rowSums更高效，以创建一个逻辑向量，然后基于它进行赋值。

i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
df$K_status <- "WT"
df$K_status[i1] <- "Mut"
df$S_status <- "WT"
df$S_status[i2] <- "Mut"

-输出

> df
   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

或者使用tidyverse:
以一种矢量化的方式进行高效的代码执行，只需创建一个键/值数据集或命名的list，然后在if_all中循环遍历列，在keydat数据集中提取相应的值，使用case_when创建新的列。

library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))
df %>%
   mutate(K_status = case_when(if_all(everything(),
    ~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"), 
   S_status = case_when(if_all(A:C, ~
    .x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))

-输出

   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

英文:

We may use base R - would be more efficient with rowSums to create a logical vector and then do the assignment based on it

 i1 &lt;- rowSums(df == c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;)[col(df)]) == 3
 i2 &lt;- rowSums(df == c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;)[col(df)]) == 3
 df$K_status &lt;- &quot;WT&quot;
 df$K_status[i1] &lt;- &quot;Mut&quot;
  df$S_status &lt;- &quot;WT&quot;
 df$S_status[i2] &lt;- &quot;Mut&quot;

-output

&gt; df
   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

Or with tidyverse in a vectorized way for efficient execution of code - just create a key/value dataset or a named list, then loop over the columns in if_all, extract the corresponding value from keydat dataset, compare and use case_when to create new columns

library(dplyr)
keydat &lt;- tibble(A = c(&quot;K&quot;, &quot;S&quot;), B = c(&quot;AA&quot;, &quot;AB&quot;), C = c(&quot;YY&quot;, &quot;Y&quot;))
df %&gt;%
   mutate(K_status = case_when(if_all(everything(),
    ~ .x == keydat[[cur_column()]][1]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;), 
   S_status = case_when(if_all(A:C, ~
    .x == keydat[[cur_column()]][2]) ~ &quot;Mut&quot;, TRUE ~ &quot;WT&quot;))

-output

   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

答案2

得分: 1

我们可以在纠正一些不一致性后使用类似的代码：

在比较 x %in% z 时，包括 rowwise，其中 z 是逐行使用的。
将 %in% df 替换为对数据框的列 A:C 的正确引用，使用 c_across。
使用 all(c(...) %in% x) 代替 all(...) %in% x

df %>%
    rowwise() %>%
    mutate(K_status = case_when(all(c("K", "AA", "YY") %in% c_across(A:C)) ~ "Mut",
           TRUE ~ "WT")) %>%
    mutate(S_status = case_when(all(c("S", "AB", "Y") %in% c_across(A:C)) ~ "Mut",
           TRUE ~ "WT")) %>%
    ungroup()
# 一个 tibble: 7 × 5
  A     B     C     K_status S_status
  <chr> <chr> <chr> <chr>    <chr>   
1 K     NA    TT    WT       WT      
2 K     AA    YY    Mut      WT      
3 K     AC    YY    WT       WT      
4 S     NA    TT    WT       WT      
5 S     AA    YY    WT       WT      
6 S     AB    Y     WT       Mut     
7 NA    LD    TT    WT       WT

英文:

We can use similar code ofter correcting several inconsistencies:

-include rowwise as we are comparing x %in% z, in which z is used rowwise

-%in% df should be replaced with a proper reference to the columns A:C of the dataframe, with c_across.

-use all(c(...) %in% x) instead of all(...) %in% x

df %&gt;%
    rowwise() %&gt;%
    mutate(K_status = case_when(all(c(&quot;K&quot;, &quot;AA&quot;, &quot;YY&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
           TRUE ~ &quot;WT&quot;)) %&gt;%
    mutate(S_status = case_when(all(c(&quot;S&quot;, &quot;AB&quot;, &quot;Y&quot;) %in% c_across(A:C)) ~ &quot;Mut&quot;,
           TRUE ~ &quot;WT&quot;)) %&gt;%
    ungroup()
# A tibble: 7 &#215; 5
  A     B     C     K_status S_status
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;   
1 K     NA    TT    WT       WT      
2 K     AA    YY    Mut      WT      
3 K     AC    YY    WT       WT      
4 S     NA    TT    WT       WT      
5 S     AA    YY    WT       WT      
6 S     AB    Y     WT       Mut     
7 NA    LD    TT    WT       WT  
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于同一数据框中的多个列的非数字变量创建新列。

问题

答案1

答案2

如何将数据框每一列的值从所有其他列中减去？

特定列中最小值的逐行列名，不包括缺失值。

如何在R中找到与特定词相关的产品名称列表

将循环生成的内容输出到一个列或每个循环一个列表中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。