使用 gsub 将字符串分成城市和县。

huangapple go评论62阅读模式
英文:

Using gsub to separate a sting into cities and counties

问题

I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.

locations
paris_france
miami_united states
new york_united states
aix-en-provence_france
auckland_new_zealand

current code used
city = gsub("([A-z]+)_([A-z]+)", "\", locations)
country = gsub("([A-z]+)_([A-z]+)", "\", locations)

so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognize more than one word before or after the '_'

英文:

I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.

 locations
 paris_france
 miami_united states
 new york_united states
 aix-en-provence_france
 auckland_new_zealand

current code used
city = gsub("([A-z]+)_([A-z]+)", "\", locations)
country = gsub("([A-z]+)_([A-z]+)", "\", locations)

so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognise more than one word before or after the '_'

答案1

得分: 3

因为 new_zealand,我们必须多加小心。

base R

strcapture("^([^_]+)_(.*)$", locs$locations, proto = c(city="", country=""))
#              city       country
# 1           paris        france
# 2           miami united states
# 3        new york united states
# 4 aix-en-provence        france
# 5        auckland   new_zealand

tidyr

library(tidyr)
separate_wider_delim(locs, locations, delim = "_", names = c("city", "country"), too_many = "merge")
# # A tibble: 5 × 2
#   city            country      
#   <chr>           <chr>        
# 1 paris           france       
# 2 miami           united states
# 3 new york        united states
# 4 aix-en-provence france       
# 5 auckland        new_zealand  

Data

locs <- structure(list(locations = c("paris_france", "miami_united states", "new york_united states", "aix-en-provence_france", "auckland_new_zealand")), row.names = c(NA, -5L), class = "data.frame")
英文:

Because of new_zealand, we have to take a little extra caution.

base R

strcapture(&quot;^([^_]+)_(.*)$&quot;, locs$locations, proto = c(city=&quot;&quot;, country=&quot;&quot;))
#              city       country
# 1           paris        france
# 2           miami united states
# 3        new york united states
# 4 aix-en-provence        france
# 5        auckland   new_zealand

tidyr

library(tidyr)
separate_wider_delim(locs, locations, delim = &quot;_&quot;, names = c(&quot;city&quot;, &quot;country&quot;), too_many = &quot;merge&quot;)
# # A tibble: 5 &#215; 2
#   city            country      
#   &lt;chr&gt;           &lt;chr&gt;        
# 1 paris           france       
# 2 miami           united states
# 3 new york        united states
# 4 aix-en-provence france       
# 5 auckland        new_zealand  

Data

locs &lt;- structure(list(locations = c(&quot;paris_france&quot;, &quot;miami_united states&quot;, &quot;new york_united states&quot;, &quot;aix-en-provence_france&quot;, &quot;auckland_new_zealand&quot;)), row.names = c(NA, -5L), class = &quot;data.frame&quot;)

huangapple
  • 本文由 发表于 2023年6月8日 21:40:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76432465.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定