英文:
Create new column based on non-numerical variables from several columns in the same dataframe in R
问题
以下是您要翻译的代码部分:
A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)
df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT"))
这部分已经翻译完毕。
英文:
I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.
A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)
To generate the K_status and S_status additional columns to df my current code is:
df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT"))
This code is not working as my intended new df should look like this
A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
K_status <- c("WT", "Mut", "WT", "WT", "WT", "WT", "WT")
S_status <- c("WT", "WT", "WT", "WT", "WT", "Mut", "WT")
df <- data.frame(A, B, C, K_status, S_status)
Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.
答案1
得分: 1
以下是翻译好的内容:
使用基本R:
我们可以使用base R
,使用rowSums
更高效,以创建一个逻辑向量,然后基于它进行赋值。
i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
df$K_status <- "WT"
df$K_status[i1] <- "Mut"
df$S_status <- "WT"
df$S_status[i2] <- "Mut"
-输出
> df
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
或者使用tidyverse:
以一种矢量化的方式进行高效的代码执行,只需创建一个键/值数据集或命名的list
,然后在if_all
中循环遍历列,在keydat数据集中提取相应的值,使用case_when
创建新的列。
library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))
df %>%
mutate(K_status = case_when(if_all(everything(),
~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"),
S_status = case_when(if_all(A:C, ~
.x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))
-输出
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
英文:
We may use base R
- would be more efficient with rowSums
to create a logical vector and then do the assignment based on it
i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
df$K_status <- "WT"
df$K_status[i1] <- "Mut"
df$S_status <- "WT"
df$S_status[i2] <- "Mut"
-output
> df
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
Or with tidyverse
in a vectorized way for efficient execution of code - just create a key/value dataset or a named list
, then loop over the columns in if_all
, extract the corresponding value from keydat dataset, compare and use case_when
to create new columns
library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))
df %>%
mutate(K_status = case_when(if_all(everything(),
~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"),
S_status = case_when(if_all(A:C, ~
.x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))
-output
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
答案2
得分: 1
我们可以在纠正一些不一致性后使用类似的代码:
- 在比较
x %in% z
时,包括rowwise
,其中z
是逐行使用的。 - 将
%in% df
替换为对数据框的列 A:C 的正确引用,使用c_across
。 - 使用
all(c(...) %in% x)
代替all(...) %in% x
df %>%
rowwise() %>%
mutate(K_status = case_when(all(c("K", "AA", "YY") %in% c_across(A:C)) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all(c("S", "AB", "Y") %in% c_across(A:C)) ~ "Mut",
TRUE ~ "WT")) %>%
ungroup()
# 一个 tibble: 7 × 5
A B C K_status S_status
<chr> <chr> <chr> <chr> <chr>
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
英文:
We can use similar code ofter correcting several inconsistencies:
-include rowwise
as we are comparing x %in% z
, in which z is used rowwise
-%in% df
should be replaced with a proper reference to the columns A:C of the dataframe, with c_across
.
-use all(c(...) %in% x)
instead of all(...) %in% x
df %>%
rowwise() %>%
mutate(K_status = case_when(all(c("K", "AA", "YY") %in% c_across(A:C)) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all(c("S", "AB", "Y") %in% c_across(A:C)) ~ "Mut",
TRUE ~ "WT")) %>%
ungroup()
# A tibble: 7 × 5
A B C K_status S_status
<chr> <chr> <chr> <chr> <chr>
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论