2023年6月8日 20:44:59go评论99阅读模式

英文:

How to arrange the values in a column to a new column based on a condition and a substring?

问题

我理解你的请求。以下是翻译好的部分：

var	text	var_field	new_col
A	happy	A$excited	D
B	sad	B$angry	C
C	angry	C$sad	B
D	excited	D$happy	A
E	NA	E$nervous	NA
F	NA	F$blue	NA
G	NA	G$lonely	NA

英文:

I have a data frame including three columns as below. I need to add a new column based on the following condition:
If in var_field, the string following $ equals the string in text, put the corresponding value in var in the new column called new_col. When text equals NA, the new_col should remain NA as well. I would really appreciate your advice.

var	text	var_field
A	happy	A$excited
B	sad	B$angry
C	angry	C$sad
D	excited	D$happy
E	NA	E$nervous
F	NA	F$blue
G	NA	G$lonely

The expected new column should look like column "new_col".

var	text	var_field	new_col
A	happy	A$excited	D
B	sad	B$angry	C
C	angry	C$sad	B
D	excited	D$happy	A
E	NA	E$nervous	NA
F	NA	F$blue	NA
G	NA	G$lonely	NA

答案1

得分: 2

使用各种 tidyverse 函数的另一种方法。

设置测试数据：

testdata <- tribble(
~var, ~text, ~var_field,
"A", "happy", "A$excited",
"B", "sad", "B$angry",
"C", "angry", "C$sad",
"D", "excited", "D$happy",
"E", NA, "E$nervous",
"F", NA, "F$blue",
"G", NA, "G$lonely")

为 text 返回哪个 var 创建一个引用：

lookup <- as_vector(testdata$var)
names(lookup) <- testdata$text

然后进行新列的创建：

testdata %>%
  mutate(
    field_text = str_extract(var_field, "(?<=\$)(.*)"), # 去掉前导字符和 "$"
    new_col = case_when(
      is.na(text) ~ NA_character_,
      .default = lookup[field_text]
    ) # 根据规范创建新列
  ) %>%
  select(-field_text) # 不再需要简化的 var_field，因此删除

结果如下：

# A tibble: 7 × 4
  var   text    var_field new_col
  <chr> <chr>   <chr>     <chr>  
1 A     happy   A$excited D      
2 B     sad     B$angry   C      
3 C     angry   C$sad     B      
4 D     excited D$happy   A      
5 E     NA      E$nervous NA     
6 F     NA      F$blue    NA     
7 G     NA      G$lonely  NA

编辑：这假设 text 下的非 NA 选项是唯一的，这是不正确的，根据 OP 的说法。

英文:

Another approach, using various tidyverse functions.

Set up test data:

testdata &lt;- tribble(
~var, ~text, ~var_field,
&quot;A&quot;,	&quot;happy&quot;,	&quot;A$excited&quot;,
&quot;B&quot;,	&quot;sad&quot;,	&quot;B$angry&quot;,
&quot;C&quot;,	&quot;angry&quot;,	&quot;C$sad&quot;,
&quot;D&quot;,	&quot;excited&quot;,	&quot;D$happy&quot;,
&quot;E&quot;,	NA,	&quot;E$nervous&quot;,
&quot;F&quot;,	NA,	&quot;F$blue&quot;,
&quot;G&quot;,	NA,	&quot;G$lonely&quot;)

create a reference for which text returns which var

lookup &lt;- as_vector(testdata$var)
names(lookup) &lt;- testdata$text

then do the creation of the new column

testdata %&gt;% mutate(
  field_text = str_extract(var_field, &quot;(?&lt;=\$)(.*)&quot;), #drop the leading character and &quot;$&quot;
  new_col = case_when(
    is.na(text) ~ NA_character_,
    .default = lookup[field_text]
  ) # created the new_col as per spec
) %&gt;% 
  select(-field_text) # drop the simplified var_field as no longer needed

gives

# A tibble: 7 &#215; 4
  var   text    var_field new_col
  &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;     &lt;chr&gt;  
1 A     happy   A$excited D      
2 B     sad     B$angry   C      
3 C     angry   C$sad     B      
4 D     excited D$happy   A      
5 E     NA      E$nervous NA     
6 F     NA      F$blue    NA     
7 G     NA      G$lonely  NA

Edit: this assumes that the non-NA options under text are unique, which is not correct according to OP.

答案2

得分: 2

对于基础R中的第一个匹配（1st(!) match）：

df_ <- read.table(header = T, text = "
var text    var_field
A   happy   A$excited
B   sad B$angry
C   angry   C$sad
D   excited D$happy
E   NA  E$nervous
F   NA  F$blue
G   NA  G$lonely")
suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
df_$new_col <- df_$var[match(df_$text, suffix)]
df_
#>   var    text var_field new_col
#> 1   A   happy A$excited       D
#> 2   B     sad   B$angry       C
#> 3   C   angry     C$sad       B
#> 4   D excited   D$happy       A
#> 5   E    <NA> E$nervous     <NA>
#> 6   F    <NA>    F$blue     <NA>
#> 7   G    <NA>  G$lonely     <NA>

^{创建于2023年06月08日，使用reprex v2.0.2}

英文:

For 1st(!) match in base R:

df_ &lt;- read.table(header = T, text = &quot;
var text    var_field
A   happy   A$excited
B   sad B$angry
C   angry   C$sad
D   excited D$happy
E   NA  E$nervous
F   NA  F$blue
G   NA  G$lonely&quot;)
suffix &lt;- sapply(strsplit(df_$var_field, &quot;$&quot;, fixed = TRUE), `[`, 2)
df_$new_col &lt;- df_$var[match(df_$text, suffix)]
df_
#&gt;   var    text var_field new_col
#&gt; 1   A   happy A$excited       D
#&gt; 2   B     sad   B$angry       C
#&gt; 3   C   angry     C$sad       B
#&gt; 4   D excited   D$happy       A
#&gt; 5   E    &lt;NA&gt; E$nervous    &lt;NA&gt;
#&gt; 6   F    &lt;NA&gt;    F$blue    &lt;NA&gt;
#&gt; 7   G    &lt;NA&gt;  G$lonely    &lt;NA&gt;

<sup>Created on 2023-06-08 with reprex v2.0.2</sup>

答案3

得分: 1

请尝试以下代码：

quux %>% 
  mutate(text2 = sub(".*\$", "", var_field)) %>% 
  left_join(quux, by = c(text2 = "text"), suffix = c("", ".y"), multiple = "first") %>% 
  mutate(new_col2 = var.y) %>% 
  select(-ends_with(".y"), -text2)
#   var    text var_field new_col new_col2
# 1   A   happy A$excited       D        D
# 2   B     sad   B$angry       C        C
# 3   C   angry     C$sad       B        B
# 4   D excited   D$happy       A        A
# 5   E    <NA> E$nervous    <NA>     <NA>
# 6   F    <NA>    F$blue    <NA>     <NA>
# 7   G    <NA>  G$lonely    <NA>     <NA>

数据

quux <- structure(list(var = c("A", "B", "C", "D", "E", "F", "G"), text = c("happy", "sad", "angry", "excited", NA, NA, NA), var_field = c("A$excited", "B$angry", "C$sad", "D$happy", "E$nervous", "F$blue", "G$lonely"), new_col = c("D", "C", "B", "A", NA, NA, NA)), class = "data.frame", row.names = c(NA, -7L))

英文:

Try this:

quux %&gt;%
  mutate(text2 = sub(&quot;.*\\$&quot;, &quot;&quot;, var_field)) %&gt;%
  left_join(quux, by = c(text2 = &quot;text&quot;), suffix = c(&quot;&quot;, &quot;.y&quot;), multiple = &quot;first&quot;) %&gt;%
  mutate(new_col2 = var.y) %&gt;%
  select(-ends_with(&quot;.y&quot;), -text2)
#   var    text var_field new_col new_col2
# 1   A   happy A$excited       D        D
# 2   B     sad   B$angry       C        C
# 3   C   angry     C$sad       B        B
# 4   D excited   D$happy       A        A
# 5   E    &lt;NA&gt; E$nervous    &lt;NA&gt;     &lt;NA&gt;
# 6   F    &lt;NA&gt;    F$blue    &lt;NA&gt;     &lt;NA&gt;
# 7   G    &lt;NA&gt;  G$lonely    &lt;NA&gt;     &lt;NA&gt;

Data

quux &lt;- structure(list(var = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;F&quot;, &quot;G&quot;), text = c(&quot;happy&quot;, &quot;sad&quot;, &quot;angry&quot;, &quot;excited&quot;, NA, NA, NA), var_field = c(&quot;A$excited&quot;, &quot;B$angry&quot;, &quot;C$sad&quot;, &quot;D$happy&quot;, &quot;E$nervous&quot;, &quot;F$blue&quot;, &quot;G$lonely&quot;), new_col = c(&quot;D&quot;, &quot;C&quot;, &quot;B&quot;, &quot;A&quot;, NA, NA, NA)), class = &quot;data.frame&quot;, row.names = c(NA, -7L))

答案4

得分: 1

以下是另一种方法的翻译结果：

library(data.table)
as.data.table(tstrsplit(df$var_field, "$", fixed = TRUE))[
  df, on = .(V2 = text)][, .(var, text = V2, var_field, new_col = V1)]

输出：

      var    text var_field new_col
   <char>  <char>    <char>  <char>
1:      A   happy A$excited       D
2:      B     sad   B$angry       C
3:      C   angry     C$sad       B
4:      D excited   D$happy       A
5:      E    <NA> E$nervous    <NA>
6:      F    <NA>    F$blue    <NA>
7:      G    <NA>  G$lonely    <NA>

请注意，输出中的<char>和是特殊的标记，表示字符和缺失值。

英文:

Here is another approach:

library(data.table)
as.data.table(tstrsplit(df$var_field, &quot;$&quot;,fixed = T))[
  df, on=.(V2 = text)][, .(var, text=V2,var_field,new_col=V1)]

Output:

      var    text var_field new_col
   &lt;char&gt;  &lt;char&gt;    &lt;char&gt;  &lt;char&gt;
1:      A   happy A$excited       D
2:      B     sad   B$angry       C
3:      C   angry     C$sad       B
4:      D excited   D$happy       A
5:      E    &lt;NA&gt; E$nervous    &lt;NA&gt;
6:      F    &lt;NA&gt;    F$blue    &lt;NA&gt;
7:      G    &lt;NA&gt;  G$lonely    &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据条件和子字符串将列中的值安排到新列中？

问题

答案1

答案2

答案3

答案4

使用R tidyverse根据固定变量创建重复行

如何将具有不同列数的频率表进行列绑定？

识别 R 中序列中大致等值数值的序列

文本对齐方式按小数点 (Swift)

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。