根据每行的值进行新列的变异。

huangapple go评论92阅读模式
英文:

Mutate a new column according to the values of each row

问题

我有以下的玩具数据框。

  1. toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"),
  2. col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
  3. col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
  4. col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))

我想要创建一个新列,检查每一行的所有列的值。如果它们都是"pos"或"NA",则变为"pos",如果它们都是"neg"或"NA",则变为"neg",如果它们是"pos"、"neg"或"NA"中的任何一个,则变为"both"。

新列看起来如下:

  1. col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")

这是最终的数据框:

  1. Name col1 col2 col3 col4
  2. group1 pos pos pos pos
  3. group2 neg pos NA both
  4. group3 NA NA pos pos
  5. group4 pos pos NA pos
  6. group5 neg neg neg neg
  7. group6 NA NA neg neg
  8. group7 pos neg neg both
英文:

I have the following toy data frame.

  1. toy.df &lt;- data.frame(Name = c(&quot;group1&quot;, &quot;group2&quot;, &quot;group3&quot;, &quot;group4&quot;, &quot;group5&quot;, &quot;group6&quot;, &quot;group7&quot;),
  2. col1 = c(&quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;,&quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;),
  3. col2 = c(&quot;pos&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;,&quot;neg&quot;,&quot;NA&quot;, &quot;neg&quot;),
  4. col3 = c(&quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;neg&quot;, &quot;neg&quot;, &quot;neg&quot;))

I would like to mutate a new column that check the values of all columns per row. If they are all "pos" or "NA" mutate "pos", if they are all "neg" or "NA" mutate "neg" and if they are "pos" or "neg" or "NA" mutate "both".

The new column looks as follows:

  1. col4 &lt;- c(&quot;pos&quot;, &quot;both&quot;, &quot;pos&quot;, &quot;pos&quot;,&quot;neg&quot;, &quot;neg&quot;,&quot;both&quot;)

Here is the final data frame:

  1. Name col1 col2 col3 col4
  2. group1 pos pos pos pos
  3. group2 neg pos NA both
  4. group3 NA NA pos pos
  5. group4 pos pos NA pos
  6. group5 neg neg neg neg
  7. group6 NA NA neg neg
  8. group7 pos neg neg both

答案1

得分: 3

以下是您要的翻译部分:

"NA"在您的数据框中是字面值"NA",我们需要使用na_if将其转换为真正的缺失值NA,然后使用case_when为新列分配条件。我们需要在每一行中使用rowwise才能使其在每一行中起作用。case_when中的最后一个TRUE ~ "unknown"捕捉了col1col3中除了"pos"和"neg"之外的字符串。

我添加了两个条目来展示当所有行都是NA或列中有拼写错误时的行为。

  1. library(dplyr)
  2. toy.df %>%
  3. rowwise() %>%
  4. mutate(across(everything(), ~na_if(.x, "NA")),
  5. col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
  6. all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
  7. all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
  8. all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
  9. TRUE ~ "unknown")) %>%
  10. ungroup()
  11. # A tibble: 9 × 5
  12. Name col1 col2 col3 col4
  13. 1 group1 pos pos pos pos
  14. 2 group2 neg pos NA both
  15. 3 group3 NA NA pos pos
  16. 4 group4 pos pos NA pos
  17. 5 group5 neg neg neg neg
  18. 6 group6 NA NA neg neg
  19. 7 group7 pos neg neg both
  20. 8 group8 NA NA NA NA
  21. 9 group9 pos pos typo unknown

数据

  1. toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4",
  2. "group5", "group6", "group7", "group8", "group9"), col1 = c("pos",
  3. "neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos",
  4. "pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos",
  5. "NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA,
  6. -9L))
英文:

Since the "NA" in your data frame is literal "NA", we need to turn it into real missing value NA by na_if. Then use case_when to supply the conditions for new column assignment. We need rowwise for it to work in every row. The final TRUE ~ &quot;unknown&quot; in case_when captures strings other than "pos" and "neg" in col1 to col3.

I added two entries to show the behaviour when all rows are NA, or when there's a typo in the columns.

  1. library(dplyr)
  2. toy.df %&gt;%
  3. rowwise() %&gt;%
  4. mutate(across(everything(), ~na_if(.x, &quot;NA&quot;)),
  5. col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
  6. all(c_across(col1:col3) == &quot;pos&quot;, na.rm = T) ~ &quot;pos&quot;,
  7. all(c_across(col1:col3) == &quot;neg&quot;, na.rm = T) ~ &quot;neg&quot;,
  8. all(c_across(col1:col3) %in% c(&quot;pos&quot;, &quot;neg&quot;, NA)) ~ &quot;both&quot;,
  9. TRUE ~ &quot;unknown&quot;)) %&gt;%
  10. ungroup()
  11. # A tibble: 9 &#215; 5
  12. Name col1 col2 col3 col4
  13. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  14. 1 group1 pos pos pos pos
  15. 2 group2 neg pos NA both
  16. 3 group3 NA NA pos pos
  17. 4 group4 pos pos NA pos
  18. 5 group5 neg neg neg neg
  19. 6 group6 NA NA neg neg
  20. 7 group7 pos neg neg both
  21. 8 group8 NA NA NA NA
  22. 9 group9 pos pos typo unknown

Data

  1. toy.df &lt;- structure(list(Name = c(&quot;group1&quot;, &quot;group2&quot;, &quot;group3&quot;, &quot;group4&quot;,
  2. &quot;group5&quot;, &quot;group6&quot;, &quot;group7&quot;, &quot;group8&quot;, &quot;group9&quot;), col1 = c(&quot;pos&quot;,
  3. &quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;pos&quot;, NA, &quot;pos&quot;), col2 = c(&quot;pos&quot;,
  4. &quot;pos&quot;, &quot;NA&quot;, &quot;pos&quot;, &quot;neg&quot;, &quot;NA&quot;, &quot;neg&quot;, NA, &quot;pos&quot;), col3 = c(&quot;pos&quot;,
  5. &quot;NA&quot;, &quot;pos&quot;, &quot;NA&quot;, &quot;neg&quot;, &quot;neg&quot;, &quot;neg&quot;, NA, &quot;typo&quot;)), class = &quot;data.frame&quot;, row.names = c(NA,
  6. -9L))

答案2

得分: 1

以下是您要翻译的代码部分:

  1. toy.df$group6 <- apply(toy.df, 1, \(x) {
  2. val <- sort(unique(x[2:4]))
  3. if (val[1] == "NA") val = val[2:length(val)]
  4. if (length(val) == 2) {
  5. "both"
  6. } else if (val=="pos")
  7. "pos"
  8. else
  9. "neg"
  10. })
  11. toy.df

out:

  1. Name col1 col2 col3 group6
  2. 1 group1 pos pos pos pos
  3. 2 group2 neg pos NA both
  4. 3 group3 NA NA pos pos
  5. 4 group4 pos pos NA pos
  6. 5 group5 neg neg neg neg
  7. 6 group6 NA NA neg neg
  8. 7 group7 pos neg neg both
英文:

Another way:

  1. toy.df$group6 &lt;- apply(toy.df, 1, \(x) {
  2. val &lt;- sort(unique(x[2:4]))
  3. if (val[1] == &quot;NA&quot;) val = val[2:length(val)]
  4. if (length(val) == 2) {
  5. &quot;both&quot;
  6. } else if (val==&quot;pos&quot;)
  7. &quot;pos&quot;
  8. else
  9. &quot;neg&quot;
  10. })
  11. toy.df

out:

  1. Name col1 col2 col3 group6
  2. 1 group1 pos pos pos pos
  3. 2 group2 neg pos NA both
  4. 3 group3 NA NA pos pos
  5. 4 group4 pos pos NA pos
  6. 5 group5 neg neg neg neg
  7. 6 group6 NA NA neg neg
  8. 7 group7 pos neg neg both

huangapple
  • 本文由 发表于 2023年4月20日 08:44:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定