如何根据条件和子字符串将列中的值安排到新列中?

huangapple go评论99阅读模式
英文:

How to arrange the values in a column to a new column based on a condition and a substring?

问题

我理解你的请求。以下是翻译好的部分:

var text var_field new_col
A happy A$excited D
B sad B$angry C
C angry C$sad B
D excited D$happy A
E NA E$nervous NA
F NA F$blue NA
G NA G$lonely NA
英文:

I have a data frame including three columns as below. I need to add a new column based on the following condition:
If in var_field, the string following $ equals the string in text, put the corresponding value in var in the new column called new_col. When text equals NA, the new_col should remain NA as well. I would really appreciate your advice.

var text var_field
A happy A$excited
B sad B$angry
C angry C$sad
D excited D$happy
E NA E$nervous
F NA F$blue
G NA G$lonely

The expected new column should look like column "new_col".

var text var_field new_col
A happy A$excited D
B sad B$angry C
C angry C$sad B
D excited D$happy A
E NA E$nervous NA
F NA F$blue NA
G NA G$lonely NA

答案1

得分: 2

使用各种 tidyverse 函数的另一种方法。

设置测试数据:

  1. testdata <- tribble(
  2. ~var, ~text, ~var_field,
  3. "A", "happy", "A$excited",
  4. "B", "sad", "B$angry",
  5. "C", "angry", "C$sad",
  6. "D", "excited", "D$happy",
  7. "E", NA, "E$nervous",
  8. "F", NA, "F$blue",
  9. "G", NA, "G$lonely")

text 返回哪个 var 创建一个引用:

  1. lookup <- as_vector(testdata$var)
  2. names(lookup) <- testdata$text

然后进行新列的创建:

  1. testdata %>%
  2. mutate(
  3. field_text = str_extract(var_field, "(?<=\$)(.*)"), # 去掉前导字符和 "$"
  4. new_col = case_when(
  5. is.na(text) ~ NA_character_,
  6. .default = lookup[field_text]
  7. ) # 根据规范创建新列
  8. ) %>%
  9. select(-field_text) # 不再需要简化的 var_field,因此删除

结果如下:

  1. # A tibble: 7 × 4
  2. var text var_field new_col
  3. <chr> <chr> <chr> <chr>
  4. 1 A happy A$excited D
  5. 2 B sad B$angry C
  6. 3 C angry C$sad B
  7. 4 D excited D$happy A
  8. 5 E NA E$nervous NA
  9. 6 F NA F$blue NA
  10. 7 G NA G$lonely NA

编辑:这假设 text 下的非 NA 选项是唯一的,这是不正确的,根据 OP 的说法。

英文:

Another approach, using various tidyverse functions.

Set up test data:

  1. testdata &lt;- tribble(
  2. ~var, ~text, ~var_field,
  3. &quot;A&quot;, &quot;happy&quot;, &quot;A$excited&quot;,
  4. &quot;B&quot;, &quot;sad&quot;, &quot;B$angry&quot;,
  5. &quot;C&quot;, &quot;angry&quot;, &quot;C$sad&quot;,
  6. &quot;D&quot;, &quot;excited&quot;, &quot;D$happy&quot;,
  7. &quot;E&quot;, NA, &quot;E$nervous&quot;,
  8. &quot;F&quot;, NA, &quot;F$blue&quot;,
  9. &quot;G&quot;, NA, &quot;G$lonely&quot;)

create a reference for which text returns which var

  1. lookup &lt;- as_vector(testdata$var)
  2. names(lookup) &lt;- testdata$text

then do the creation of the new column

  1. testdata %&gt;% mutate(
  2. field_text = str_extract(var_field, &quot;(?&lt;=\$)(.*)&quot;), #drop the leading character and &quot;$&quot;
  3. new_col = case_when(
  4. is.na(text) ~ NA_character_,
  5. .default = lookup[field_text]
  6. ) # created the new_col as per spec
  7. ) %&gt;%
  8. select(-field_text) # drop the simplified var_field as no longer needed

gives

  1. # A tibble: 7 &#215; 4
  2. var text var_field new_col
  3. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  4. 1 A happy A$excited D
  5. 2 B sad B$angry C
  6. 3 C angry C$sad B
  7. 4 D excited D$happy A
  8. 5 E NA E$nervous NA
  9. 6 F NA F$blue NA
  10. 7 G NA G$lonely NA

Edit: this assumes that the non-NA options under text are unique, which is not correct according to OP.

答案2

得分: 2

对于基础R中的第一个匹配(1st(!) match):

  1. df_ <- read.table(header = T, text = "
  2. var text var_field
  3. A happy A$excited
  4. B sad B$angry
  5. C angry C$sad
  6. D excited D$happy
  7. E NA E$nervous
  8. F NA F$blue
  9. G NA G$lonely")
  10. suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
  11. df_$new_col <- df_$var[match(df_$text, suffix)]
  12. df_
  13. #> var text var_field new_col
  14. #> 1 A happy A$excited D
  15. #> 2 B sad B$angry C
  16. #> 3 C angry C$sad B
  17. #> 4 D excited D$happy A
  18. #> 5 E <NA> E$nervous <NA>
  19. #> 6 F <NA> F$blue <NA>
  20. #> 7 G <NA> G$lonely <NA>

创建于2023年06月08日,使用reprex v2.0.2

英文:

For 1st(!) match in base R:

  1. df_ &lt;- read.table(header = T, text = &quot;
  2. var text var_field
  3. A happy A$excited
  4. B sad B$angry
  5. C angry C$sad
  6. D excited D$happy
  7. E NA E$nervous
  8. F NA F$blue
  9. G NA G$lonely&quot;)
  10. suffix &lt;- sapply(strsplit(df_$var_field, &quot;$&quot;, fixed = TRUE), `[`, 2)
  11. df_$new_col &lt;- df_$var[match(df_$text, suffix)]
  12. df_
  13. #&gt; var text var_field new_col
  14. #&gt; 1 A happy A$excited D
  15. #&gt; 2 B sad B$angry C
  16. #&gt; 3 C angry C$sad B
  17. #&gt; 4 D excited D$happy A
  18. #&gt; 5 E &lt;NA&gt; E$nervous &lt;NA&gt;
  19. #&gt; 6 F &lt;NA&gt; F$blue &lt;NA&gt;
  20. #&gt; 7 G &lt;NA&gt; G$lonely &lt;NA&gt;

<sup>Created on 2023-06-08 with reprex v2.0.2</sup>

答案3

得分: 1

请尝试以下代码:

  1. quux %>%
  2. mutate(text2 = sub(".*\$", "", var_field)) %>%
  3. left_join(quux, by = c(text2 = "text"), suffix = c("", ".y"), multiple = "first") %>%
  4. mutate(new_col2 = var.y) %>%
  5. select(-ends_with(".y"), -text2)
  6. # var text var_field new_col new_col2
  7. # 1 A happy A$excited D D
  8. # 2 B sad B$angry C C
  9. # 3 C angry C$sad B B
  10. # 4 D excited D$happy A A
  11. # 5 E <NA> E$nervous <NA> <NA>
  12. # 6 F <NA> F$blue <NA> <NA>
  13. # 7 G <NA> G$lonely <NA> <NA>

数据

  1. quux <- structure(list(var = c("A", "B", "C", "D", "E", "F", "G"), text = c("happy", "sad", "angry", "excited", NA, NA, NA), var_field = c("A$excited", "B$angry", "C$sad", "D$happy", "E$nervous", "F$blue", "G$lonely"), new_col = c("D", "C", "B", "A", NA, NA, NA)), class = "data.frame", row.names = c(NA, -7L))
英文:

Try this:

  1. quux %&gt;%
  2. mutate(text2 = sub(&quot;.*\\$&quot;, &quot;&quot;, var_field)) %&gt;%
  3. left_join(quux, by = c(text2 = &quot;text&quot;), suffix = c(&quot;&quot;, &quot;.y&quot;), multiple = &quot;first&quot;) %&gt;%
  4. mutate(new_col2 = var.y) %&gt;%
  5. select(-ends_with(&quot;.y&quot;), -text2)
  6. # var text var_field new_col new_col2
  7. # 1 A happy A$excited D D
  8. # 2 B sad B$angry C C
  9. # 3 C angry C$sad B B
  10. # 4 D excited D$happy A A
  11. # 5 E &lt;NA&gt; E$nervous &lt;NA&gt; &lt;NA&gt;
  12. # 6 F &lt;NA&gt; F$blue &lt;NA&gt; &lt;NA&gt;
  13. # 7 G &lt;NA&gt; G$lonely &lt;NA&gt; &lt;NA&gt;

Data

  1. quux &lt;- structure(list(var = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;F&quot;, &quot;G&quot;), text = c(&quot;happy&quot;, &quot;sad&quot;, &quot;angry&quot;, &quot;excited&quot;, NA, NA, NA), var_field = c(&quot;A$excited&quot;, &quot;B$angry&quot;, &quot;C$sad&quot;, &quot;D$happy&quot;, &quot;E$nervous&quot;, &quot;F$blue&quot;, &quot;G$lonely&quot;), new_col = c(&quot;D&quot;, &quot;C&quot;, &quot;B&quot;, &quot;A&quot;, NA, NA, NA)), class = &quot;data.frame&quot;, row.names = c(NA, -7L))

答案4

得分: 1

以下是另一种方法的翻译结果:

  1. library(data.table)
  2. as.data.table(tstrsplit(df$var_field, "$", fixed = TRUE))[
  3. df, on = .(V2 = text)][, .(var, text = V2, var_field, new_col = V1)]

输出:

  1. var text var_field new_col
  2. <char> <char> <char> <char>
  3. 1: A happy A$excited D
  4. 2: B sad B$angry C
  5. 3: C angry C$sad B
  6. 4: D excited D$happy A
  7. 5: E <NA> E$nervous <NA>
  8. 6: F <NA> F$blue <NA>
  9. 7: G <NA> G$lonely <NA>

请注意,输出中的<char>和是特殊的标记,表示字符和缺失值。

英文:

Here is another approach:

  1. library(data.table)
  2. as.data.table(tstrsplit(df$var_field, &quot;$&quot;,fixed = T))[
  3. df, on=.(V2 = text)][, .(var, text=V2,var_field,new_col=V1)]

Output:

  1. var text var_field new_col
  2. &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt;
  3. 1: A happy A$excited D
  4. 2: B sad B$angry C
  5. 3: C angry C$sad B
  6. 4: D excited D$happy A
  7. 5: E &lt;NA&gt; E$nervous &lt;NA&gt;
  8. 6: F &lt;NA&gt; F$blue &lt;NA&gt;
  9. 7: G &lt;NA&gt; G$lonely &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年6月8日 20:44:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定