英文:
Using gsub to separate a sting into cities and counties
问题
I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.
locations
paris_france
miami_united states
new york_united states
aix-en-provence_france
auckland_new_zealand
current code used
city = gsub("([A-z]+)_([A-z]+)", "\", locations)
country = gsub("([A-z]+)_([A-z]+)", "\", locations)
so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognize more than one word before or after the '_'
英文:
I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.
locations
paris_france
miami_united states
new york_united states
aix-en-provence_france
auckland_new_zealand
current code used
city = gsub("([A-z]+)_([A-z]+)", "\", locations)
country = gsub("([A-z]+)_([A-z]+)", "\", locations)
so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognise more than one word before or after the '_'
答案1
得分: 3
因为 new_zealand
,我们必须多加小心。
base R
strcapture("^([^_]+)_(.*)$", locs$locations, proto = c(city="", country=""))
# city country
# 1 paris france
# 2 miami united states
# 3 new york united states
# 4 aix-en-provence france
# 5 auckland new_zealand
tidyr
library(tidyr)
separate_wider_delim(locs, locations, delim = "_", names = c("city", "country"), too_many = "merge")
# # A tibble: 5 × 2
# city country
# <chr> <chr>
# 1 paris france
# 2 miami united states
# 3 new york united states
# 4 aix-en-provence france
# 5 auckland new_zealand
Data
locs <- structure(list(locations = c("paris_france", "miami_united states", "new york_united states", "aix-en-provence_france", "auckland_new_zealand")), row.names = c(NA, -5L), class = "data.frame")
英文:
Because of new_zealand
, we have to take a little extra caution.
base R
strcapture("^([^_]+)_(.*)$", locs$locations, proto = c(city="", country=""))
# city country
# 1 paris france
# 2 miami united states
# 3 new york united states
# 4 aix-en-provence france
# 5 auckland new_zealand
tidyr
library(tidyr)
separate_wider_delim(locs, locations, delim = "_", names = c("city", "country"), too_many = "merge")
# # A tibble: 5 × 2
# city country
# <chr> <chr>
# 1 paris france
# 2 miami united states
# 3 new york united states
# 4 aix-en-provence france
# 5 auckland new_zealand
Data
locs <- structure(list(locations = c("paris_france", "miami_united states", "new york_united states", "aix-en-provence_france", "auckland_new_zealand")), row.names = c(NA, -5L), class = "data.frame")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论