根据每行的值进行新列的变异。

huangapple go评论65阅读模式
英文:

Mutate a new column according to the values of each row

问题

我有以下的玩具数据框。

toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"), 
                 col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
                 col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
                 col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))

我想要创建一个新列,检查每一行的所有列的值。如果它们都是"pos"或"NA",则变为"pos",如果它们都是"neg"或"NA",则变为"neg",如果它们是"pos"、"neg"或"NA"中的任何一个,则变为"both"。

新列看起来如下:

col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")

这是最终的数据框:

 Name  col1 col2 col3 col4
group1  pos  pos  pos  pos
group2  neg  pos  NA  both
group3  NA   NA   pos  pos
group4  pos  pos   NA  pos
group5  neg  neg  neg  neg
group6  NA   NA   neg  neg
group7  pos  neg  neg both
英文:

I have the following toy data frame.

toy.df &lt;- data.frame(Name = c(&quot;group1&quot;, &quot;group2&quot;, &quot;group3&quot;, &quot;group4&quot;, &quot;group5&quot;, &quot;group6&quot;, &quot;group7&quot;), 
                 col1 = c(&quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;,&quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;),
                 col2 = c(&quot;pos&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;,&quot;neg&quot;,&quot;NA&quot;, &quot;neg&quot;),
                 col3 = c(&quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;neg&quot;, &quot;neg&quot;, &quot;neg&quot;))

I would like to mutate a new column that check the values of all columns per row. If they are all "pos" or "NA" mutate "pos", if they are all "neg" or "NA" mutate "neg" and if they are "pos" or "neg" or "NA" mutate "both".

The new column looks as follows:

col4 &lt;- c(&quot;pos&quot;, &quot;both&quot;, &quot;pos&quot;, &quot;pos&quot;,&quot;neg&quot;, &quot;neg&quot;,&quot;both&quot;)

Here is the final data frame:

 Name  col1 col2 col3 col4
group1  pos  pos  pos  pos
group2  neg  pos  NA  both
group3  NA   NA   pos  pos
group4  pos  pos   NA  pos
group5  neg  neg  neg  neg
group6  NA   NA   neg  neg
group7  pos  neg  neg both

答案1

得分: 3

以下是您要的翻译部分:

"NA"在您的数据框中是字面值"NA",我们需要使用na_if将其转换为真正的缺失值NA,然后使用case_when为新列分配条件。我们需要在每一行中使用rowwise才能使其在每一行中起作用。case_when中的最后一个TRUE ~ "unknown"捕捉了col1col3中除了"pos"和"neg"之外的字符串。

我添加了两个条目来展示当所有行都是NA或列中有拼写错误时的行为。

library(dplyr)

toy.df %>%
  rowwise() %>%
  mutate(across(everything(), ~na_if(.x, "NA")),
         col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
                          all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
                          all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
                          all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
                          TRUE ~ "unknown")) %>%
  ungroup()

# A tibble: 9 × 5
  Name   col1  col2  col3  col4   
1 group1 pos   pos   pos   pos    
2 group2 neg   pos   NA    both   
3 group3 NA    NA    pos   pos    
4 group4 pos   pos   NA    pos    
5 group5 neg   neg   neg   neg    
6 group6 NA    NA    neg   neg    
7 group7 pos   neg   neg   both   
8 group8 NA    NA    NA    NA     
9 group9 pos   pos   typo  unknown

数据

toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4", 
"group5", "group6", "group7", "group8", "group9"), col1 = c("pos", 
"neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos", 
"pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos", 
"NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA, 
-9L))
英文:

Since the "NA" in your data frame is literal "NA", we need to turn it into real missing value NA by na_if. Then use case_when to supply the conditions for new column assignment. We need rowwise for it to work in every row. The final TRUE ~ &quot;unknown&quot; in case_when captures strings other than "pos" and "neg" in col1 to col3.

I added two entries to show the behaviour when all rows are NA, or when there's a typo in the columns.

library(dplyr)

toy.df %&gt;% 
  rowwise() %&gt;%  
  mutate(across(everything(), ~na_if(.x, &quot;NA&quot;)),
         col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
                          all(c_across(col1:col3) == &quot;pos&quot;, na.rm = T) ~ &quot;pos&quot;,
                          all(c_across(col1:col3) == &quot;neg&quot;, na.rm = T) ~ &quot;neg&quot;,
                          all(c_across(col1:col3) %in% c(&quot;pos&quot;, &quot;neg&quot;, NA)) ~ &quot;both&quot;,
                          TRUE ~ &quot;unknown&quot;)) %&gt;% 
  ungroup()

# A tibble: 9 &#215; 5
  Name   col1  col2  col3  col4   
  &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  
1 group1 pos   pos   pos   pos    
2 group2 neg   pos   NA    both   
3 group3 NA    NA    pos   pos    
4 group4 pos   pos   NA    pos    
5 group5 neg   neg   neg   neg    
6 group6 NA    NA    neg   neg    
7 group7 pos   neg   neg   both   
8 group8 NA    NA    NA    NA     
9 group9 pos   pos   typo  unknown

Data

toy.df &lt;- structure(list(Name = c(&quot;group1&quot;, &quot;group2&quot;, &quot;group3&quot;, &quot;group4&quot;, 
&quot;group5&quot;, &quot;group6&quot;, &quot;group7&quot;, &quot;group8&quot;, &quot;group9&quot;), col1 = c(&quot;pos&quot;, 
&quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;, NA, &quot;pos&quot;), col2 = c(&quot;pos&quot;, 
&quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;neg&quot;, NA, &quot;pos&quot;), col3 = c(&quot;pos&quot;, 
&quot;NA&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;neg&quot;, &quot;neg&quot;, &quot;neg&quot;, NA, &quot;typo&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, 
-9L))

答案2

得分: 1

以下是您要翻译的代码部分:

toy.df$group6 <- apply(toy.df, 1, \(x) {
  val <- sort(unique(x[2:4]))
  if (val[1] == "NA") val = val[2:length(val)]
  if (length(val) == 2) {
    "both"
  } else if (val=="pos")
    "pos"
  else 
    "neg"
})
toy.df

out:

    Name col1 col2 col3 group6
1 group1  pos  pos  pos    pos
2 group2  neg  pos   NA   both
3 group3   NA   NA  pos    pos
4 group4  pos  pos   NA    pos
5 group5  neg  neg  neg    neg
6 group6   NA   NA  neg    neg
7 group7  pos  neg  neg   both
英文:

Another way:

toy.df$group6 &lt;- apply(toy.df, 1, \(x) {
  val &lt;- sort(unique(x[2:4]))
  if (val[1] == &quot;NA&quot;) val = val[2:length(val)]
  if (length(val) == 2) {
    &quot;both&quot;
  } else if (val==&quot;pos&quot;)
    &quot;pos&quot;
  else 
    &quot;neg&quot;
})
toy.df

out:

    Name col1 col2 col3 group6
1 group1  pos  pos  pos    pos
2 group2  neg  pos   NA   both
3 group3   NA   NA  pos    pos
4 group4  pos  pos   NA    pos
5 group5  neg  neg  neg    neg
6 group6   NA   NA  neg    neg
7 group7  pos  neg  neg   both

huangapple
  • 本文由 发表于 2023年4月20日 08:44:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定