英文:
Mutate a new column according to the values of each row
问题
我有以下的玩具数据框。
toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"),
col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))
我想要创建一个新列,检查每一行的所有列的值。如果它们都是"pos"或"NA",则变为"pos",如果它们都是"neg"或"NA",则变为"neg",如果它们是"pos"、"neg"或"NA"中的任何一个,则变为"both"。
新列看起来如下:
col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")
这是最终的数据框:
Name col1 col2 col3 col4
group1 pos pos pos pos
group2 neg pos NA both
group3 NA NA pos pos
group4 pos pos NA pos
group5 neg neg neg neg
group6 NA NA neg neg
group7 pos neg neg both
英文:
I have the following toy data frame.
toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"),
col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))
I would like to mutate a new column that check the values of all columns per row. If they are all "pos" or "NA" mutate "pos", if they are all "neg" or "NA" mutate "neg" and if they are "pos" or "neg" or "NA" mutate "both".
The new column looks as follows:
col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")
Here is the final data frame:
Name col1 col2 col3 col4
group1 pos pos pos pos
group2 neg pos NA both
group3 NA NA pos pos
group4 pos pos NA pos
group5 neg neg neg neg
group6 NA NA neg neg
group7 pos neg neg both
答案1
得分: 3
以下是您要的翻译部分:
"NA"在您的数据框中是字面值"NA",我们需要使用na_if
将其转换为真正的缺失值NA
,然后使用case_when
为新列分配条件。我们需要在每一行中使用rowwise
才能使其在每一行中起作用。case_when
中的最后一个TRUE ~ "unknown"
捕捉了col1
到col3
中除了"pos"和"neg"之外的字符串。
我添加了两个条目来展示当所有行都是NA
或列中有拼写错误时的行为。
library(dplyr)
toy.df %>%
rowwise() %>%
mutate(across(everything(), ~na_if(.x, "NA")),
col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
TRUE ~ "unknown")) %>%
ungroup()
# A tibble: 9 × 5
Name col1 col2 col3 col4
1 group1 pos pos pos pos
2 group2 neg pos NA both
3 group3 NA NA pos pos
4 group4 pos pos NA pos
5 group5 neg neg neg neg
6 group6 NA NA neg neg
7 group7 pos neg neg both
8 group8 NA NA NA NA
9 group9 pos pos typo unknown
数据
toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4",
"group5", "group6", "group7", "group8", "group9"), col1 = c("pos",
"neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos",
"pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos",
"NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA,
-9L))
英文:
Since the "NA" in your data frame is literal "NA", we need to turn it into real missing value NA
by na_if
. Then use case_when
to supply the conditions for new column assignment. We need rowwise
for it to work in every row. The final TRUE ~ "unknown"
in case_when
captures strings other than "pos" and "neg" in col1
to col3
.
I added two entries to show the behaviour when all rows are NA
, or when there's a typo in the columns.
library(dplyr)
toy.df %>%
rowwise() %>%
mutate(across(everything(), ~na_if(.x, "NA")),
col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
TRUE ~ "unknown")) %>%
ungroup()
# A tibble: 9 × 5
Name col1 col2 col3 col4
<chr> <chr> <chr> <chr> <chr>
1 group1 pos pos pos pos
2 group2 neg pos NA both
3 group3 NA NA pos pos
4 group4 pos pos NA pos
5 group5 neg neg neg neg
6 group6 NA NA neg neg
7 group7 pos neg neg both
8 group8 NA NA NA NA
9 group9 pos pos typo unknown
Data
toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4",
"group5", "group6", "group7", "group8", "group9"), col1 = c("pos",
"neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos",
"pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos",
"NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA,
-9L))
答案2
得分: 1
以下是您要翻译的代码部分:
toy.df$group6 <- apply(toy.df, 1, \(x) {
val <- sort(unique(x[2:4]))
if (val[1] == "NA") val = val[2:length(val)]
if (length(val) == 2) {
"both"
} else if (val=="pos")
"pos"
else
"neg"
})
toy.df
out:
Name col1 col2 col3 group6
1 group1 pos pos pos pos
2 group2 neg pos NA both
3 group3 NA NA pos pos
4 group4 pos pos NA pos
5 group5 neg neg neg neg
6 group6 NA NA neg neg
7 group7 pos neg neg both
英文:
Another way:
toy.df$group6 <- apply(toy.df, 1, \(x) {
val <- sort(unique(x[2:4]))
if (val[1] == "NA") val = val[2:length(val)]
if (length(val) == 2) {
"both"
} else if (val=="pos")
"pos"
else
"neg"
})
toy.df
out:
Name col1 col2 col3 group6
1 group1 pos pos pos pos
2 group2 neg pos NA both
3 group3 NA NA pos pos
4 group4 pos pos NA pos
5 group5 neg neg neg neg
6 group6 NA NA neg neg
7 group7 pos neg neg both
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论