英文:
How to arrange the values in a column to a new column based on a condition and a substring?
问题
我理解你的请求。以下是翻译好的部分:
var | text | var_field | new_col |
---|---|---|---|
A | happy | A$excited | D |
B | sad | B$angry | C |
C | angry | C$sad | B |
D | excited | D$happy | A |
E | NA | E$nervous | NA |
F | NA | F$blue | NA |
G | NA | G$lonely | NA |
英文:
I have a data frame including three columns as below. I need to add a new column based on the following condition:
If in var_field
, the string following $
equals the string in text
, put the corresponding value in var
in the new column called new_col
. When text
equals NA, the new_col
should remain NA as well. I would really appreciate your advice.
var | text | var_field |
---|---|---|
A | happy | A$excited |
B | sad | B$angry |
C | angry | C$sad |
D | excited | D$happy |
E | NA | E$nervous |
F | NA | F$blue |
G | NA | G$lonely |
The expected new column should look like column "new_col".
var | text | var_field | new_col |
---|---|---|---|
A | happy | A$excited | D |
B | sad | B$angry | C |
C | angry | C$sad | B |
D | excited | D$happy | A |
E | NA | E$nervous | NA |
F | NA | F$blue | NA |
G | NA | G$lonely | NA |
答案1
得分: 2
使用各种 tidyverse
函数的另一种方法。
设置测试数据:
testdata <- tribble(
~var, ~text, ~var_field,
"A", "happy", "A$excited",
"B", "sad", "B$angry",
"C", "angry", "C$sad",
"D", "excited", "D$happy",
"E", NA, "E$nervous",
"F", NA, "F$blue",
"G", NA, "G$lonely")
为 text
返回哪个 var
创建一个引用:
lookup <- as_vector(testdata$var)
names(lookup) <- testdata$text
然后进行新列的创建:
testdata %>%
mutate(
field_text = str_extract(var_field, "(?<=\$)(.*)"), # 去掉前导字符和 "$"
new_col = case_when(
is.na(text) ~ NA_character_,
.default = lookup[field_text]
) # 根据规范创建新列
) %>%
select(-field_text) # 不再需要简化的 var_field,因此删除
结果如下:
# A tibble: 7 × 4
var text var_field new_col
<chr> <chr> <chr> <chr>
1 A happy A$excited D
2 B sad B$angry C
3 C angry C$sad B
4 D excited D$happy A
5 E NA E$nervous NA
6 F NA F$blue NA
7 G NA G$lonely NA
编辑:这假设 text
下的非 NA 选项是唯一的,这是不正确的,根据 OP 的说法。
英文:
Another approach, using various tidyverse
functions.
Set up test data:
testdata <- tribble(
~var, ~text, ~var_field,
"A", "happy", "A$excited",
"B", "sad", "B$angry",
"C", "angry", "C$sad",
"D", "excited", "D$happy",
"E", NA, "E$nervous",
"F", NA, "F$blue",
"G", NA, "G$lonely")
create a reference for which text
returns which var
lookup <- as_vector(testdata$var)
names(lookup) <- testdata$text
then do the creation of the new column
testdata %>% mutate(
field_text = str_extract(var_field, "(?<=\$)(.*)"), #drop the leading character and "$"
new_col = case_when(
is.na(text) ~ NA_character_,
.default = lookup[field_text]
) # created the new_col as per spec
) %>%
select(-field_text) # drop the simplified var_field as no longer needed
gives
# A tibble: 7 × 4
var text var_field new_col
<chr> <chr> <chr> <chr>
1 A happy A$excited D
2 B sad B$angry C
3 C angry C$sad B
4 D excited D$happy A
5 E NA E$nervous NA
6 F NA F$blue NA
7 G NA G$lonely NA
Edit: this assumes that the non-NA options under text
are unique, which is not correct according to OP.
答案2
得分: 2
对于基础R中的第一个匹配(1st(!) match):
df_ <- read.table(header = T, text = "
var text var_field
A happy A$excited
B sad B$angry
C angry C$sad
D excited D$happy
E NA E$nervous
F NA F$blue
G NA G$lonely")
suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
df_$new_col <- df_$var[match(df_$text, suffix)]
df_
#> var text var_field new_col
#> 1 A happy A$excited D
#> 2 B sad B$angry C
#> 3 C angry C$sad B
#> 4 D excited D$happy A
#> 5 E <NA> E$nervous <NA>
#> 6 F <NA> F$blue <NA>
#> 7 G <NA> G$lonely <NA>
创建于2023年06月08日,使用reprex v2.0.2
英文:
For 1st(!) match in base R:
df_ <- read.table(header = T, text = "
var text var_field
A happy A$excited
B sad B$angry
C angry C$sad
D excited D$happy
E NA E$nervous
F NA F$blue
G NA G$lonely")
suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
df_$new_col <- df_$var[match(df_$text, suffix)]
df_
#> var text var_field new_col
#> 1 A happy A$excited D
#> 2 B sad B$angry C
#> 3 C angry C$sad B
#> 4 D excited D$happy A
#> 5 E <NA> E$nervous <NA>
#> 6 F <NA> F$blue <NA>
#> 7 G <NA> G$lonely <NA>
<sup>Created on 2023-06-08 with reprex v2.0.2</sup>
答案3
得分: 1
请尝试以下代码:
quux %>%
mutate(text2 = sub(".*\$", "", var_field)) %>%
left_join(quux, by = c(text2 = "text"), suffix = c("", ".y"), multiple = "first") %>%
mutate(new_col2 = var.y) %>%
select(-ends_with(".y"), -text2)
# var text var_field new_col new_col2
# 1 A happy A$excited D D
# 2 B sad B$angry C C
# 3 C angry C$sad B B
# 4 D excited D$happy A A
# 5 E <NA> E$nervous <NA> <NA>
# 6 F <NA> F$blue <NA> <NA>
# 7 G <NA> G$lonely <NA> <NA>
数据
quux <- structure(list(var = c("A", "B", "C", "D", "E", "F", "G"), text = c("happy", "sad", "angry", "excited", NA, NA, NA), var_field = c("A$excited", "B$angry", "C$sad", "D$happy", "E$nervous", "F$blue", "G$lonely"), new_col = c("D", "C", "B", "A", NA, NA, NA)), class = "data.frame", row.names = c(NA, -7L))
英文:
Try this:
quux %>%
mutate(text2 = sub(".*\\$", "", var_field)) %>%
left_join(quux, by = c(text2 = "text"), suffix = c("", ".y"), multiple = "first") %>%
mutate(new_col2 = var.y) %>%
select(-ends_with(".y"), -text2)
# var text var_field new_col new_col2
# 1 A happy A$excited D D
# 2 B sad B$angry C C
# 3 C angry C$sad B B
# 4 D excited D$happy A A
# 5 E <NA> E$nervous <NA> <NA>
# 6 F <NA> F$blue <NA> <NA>
# 7 G <NA> G$lonely <NA> <NA>
Data
quux <- structure(list(var = c("A", "B", "C", "D", "E", "F", "G"), text = c("happy", "sad", "angry", "excited", NA, NA, NA), var_field = c("A$excited", "B$angry", "C$sad", "D$happy", "E$nervous", "F$blue", "G$lonely"), new_col = c("D", "C", "B", "A", NA, NA, NA)), class = "data.frame", row.names = c(NA, -7L))
答案4
得分: 1
以下是另一种方法的翻译结果:
library(data.table)
as.data.table(tstrsplit(df$var_field, "$", fixed = TRUE))[
df, on = .(V2 = text)][, .(var, text = V2, var_field, new_col = V1)]
输出:
var text var_field new_col
<char> <char> <char> <char>
1: A happy A$excited D
2: B sad B$angry C
3: C angry C$sad B
4: D excited D$happy A
5: E <NA> E$nervous <NA>
6: F <NA> F$blue <NA>
7: G <NA> G$lonely <NA>
请注意,输出中的<char>和
英文:
Here is another approach:
library(data.table)
as.data.table(tstrsplit(df$var_field, "$",fixed = T))[
df, on=.(V2 = text)][, .(var, text=V2,var_field,new_col=V1)]
Output:
var text var_field new_col
<char> <char> <char> <char>
1: A happy A$excited D
2: B sad B$angry C
3: C angry C$sad B
4: D excited D$happy A
5: E <NA> E$nervous <NA>
6: F <NA> F$blue <NA>
7: G <NA> G$lonely <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论