英文:
Extract values after a word/alphabet from one column to a new column
问题
GEOID_Final = c("42101000101", "42101000102", "42101000103", "42101000104", "42101000105")
Desired_df = data.frame(id, GEOID, GEOID_Final)
英文:
Based on the data below how can I create another column based on the values after the alphabetU
in every row of column GEOID
?
Data:
id = c(1, 2, 3, 4, 5)
GEOID = c("1400000US42101000101", "1400000US42101000102", "1400000US42101000103",
"1400000US42101000104", "1400000US42101000105")
df = data.frame(id, GEOID)
Desired output:
id = c(1, 2, 3, 4, 5)
GEOID = c("1400000US42101000101", "1400000US42101000102", "1400000US42101000103",
"1400000US42101000104", "1400000US42101000105")
GEOID_Final = c("42101000101", "42101000102", "42101000103", "42101000104", "42101000105")
Desired_df = data.frame(id, GEOID, GEOID_Final)
Code:
library(stringr)
library(dplyr)
desired_df = df %>% word(?, sep = "US") # Stuck
答案1
得分: 3
我会使用正则表达式来找到“US”之前和“US”后面的所有数字,并从字符串中删除它们,只保留“US”后面的数字。
library(stringr); library(dplyr)
id = c(1, 2, 3, 4, 5)
GEOID = c("1400000US42101000101", "1400000US42101000102", "1400000US42101000103",
"1400000US42101000104", "1400000US42101000105")
df = data.frame(id, GEOID)
df %>%
mutate(
# 用空字符串替换每个“US”之前和“US”后面的数字
GEOID_final = str_replace(GEOID, pattern="\d+US" , replacement ="")
)
id GEOID GEOID_final
1 1 1400000US42101000101 42101000101
2 2 1400000US42101000102 42101000102
3 3 1400000US42101000103 42101000103
4 4 1400000US42101000104 42101000104
5 5 1400000US42101000105 42101000105
`str_replace`接受一个字符串(在这种情况下,是一个字符串向量,GEOID),并将用`replacement`替换掉与`pattern`匹配的内容。我使用的模式是\\\d+US,表示“匹配一个或多个数字,后跟US”。然后将其替换为空,只保留US后面的数字。您也可以使用.+US进行匹配,表示“匹配US之前的任何内容,以及US”。
希望这有所帮助!
英文:
I would use regular expressions to find every digits before "US" and US, and remove them from the string, only keeping the numbers after US.
library(stringr); library(dplyr)
id = c(1, 2, 3, 4, 5)
GEOID = c("1400000US42101000101", "1400000US42101000102", "1400000US42101000103",
"1400000US42101000104", "1400000US42101000105")
df = data.frame(id, GEOID)
df %>%
mutate(
# replace every digit before US and US with empty string
GEOID_final = str_replace(GEOID, pattern="\\d+US" , replacement ="")
)
# id GEOID GEOID_final
# 1 1 1400000US42101000101 42101000101
# 2 2 1400000US42101000102 42101000102
# 3 3 1400000US42101000103 42101000103
# 4 4 1400000US42101000104 42101000104
# 5 5 1400000US42101000105 42101000105
str_replace
takes a string (or in this case, a vector of string, GEOID), and will replace with replacement
what has been matched with the pattern
. The pattern I use is \\d+US, which means 'match any digit one or more times, followed by US'. This is then replaced with nothing, only keeping the digits after US. You could also match it with .+US, which means 'match anything before US, and US'.
Hope this helps!
答案2
得分: 1
如果你想使用word()
desired_df = df %>% word(?, sep = "US") # 卡住
尝试使用 `stringr::word`
word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
结果
> word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
[1] "42101000101" "42101000102" "42101000103" "42101000104" "42101000105"
将它们保存在第三列或在mutate中使用:
df %>% mutate(
GEOID_Final = word(string = str_replace(GEOID, pattern = 'US', replacement = " "), start = 2, end = 2)
)
结果
id GEOID GEOID_Final
1 1 1400000US42101000101 42101000101
2 2 1400000US42101000102 42101000102
3 3 1400000US42101000103 42101000103
4 4 1400000US42101000104 42101000104
5 5 1400000US42101000105 42101000105
英文:
If you want to use word()
desired_df = df %>% word(?, sep = "US") # Stuck
try this with stringr::word
word(string = str_replace(GEOID,pattern = 'US',replacement = " "),start = 2,end = 2)
result
> word(string = str_replace(GEOID,pattern = 'US',replacement = " "),start = 2,end = 2)
[1] "42101000101" "42101000102" "42101000103" "42101000104" "42101000105"
save them in a 3rd column or use in mutate:
df%>%mutate(
GEOID_Final = word(string = str_replace(GEOID,pattern = 'US',replacement = " "),start = 2,end = 2)
)
result
id GEOID GEOID_Final
1 1 1400000US42101000101 42101000101
2 2 1400000US42101000102 42101000102
3 3 1400000US42101000103 42101000103
4 4 1400000US42101000104 42101000104
5 5 1400000US42101000105 42101000105
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论