使用 gsub 将字符串分成城市和县。

huangapple go评论86阅读模式
英文:

Using gsub to separate a sting into cities and counties

问题

I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.

  1. locations
  2. paris_france
  3. miami_united states
  4. new york_united states
  5. aix-en-provence_france
  6. auckland_new_zealand
  7. current code used
  8. city = gsub("([A-z]+)_([A-z]+)", "\", locations)
  9. country = gsub("([A-z]+)_([A-z]+)", "\", locations)

so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognize more than one word before or after the '_'

英文:

I am trying to split a string into cities and countries but having difficulty when the city or country in question is more than one word (Eg, aix-en-provence or United States). The current code I am using a will work for most like Paris, France but not for ones similar to those above.

  1. locations
  2. paris_france
  3. miami_united states
  4. new york_united states
  5. aix-en-provence_france
  6. auckland_new_zealand
  7. current code used
  8. city = gsub("([A-z]+)_([A-z]+)", "\", locations)
  9. country = gsub("([A-z]+)_([A-z]+)", "\", locations)

so now city will return paris and country will be france which is fine but for other stuff like auckland and zealand will be returned. Guessing its obviously a case of getting it to recognise more than one word before or after the '_'

答案1

得分: 3

因为 new_zealand,我们必须多加小心。

base R

  1. strcapture("^([^_]+)_(.*)$", locs$locations, proto = c(city="", country=""))
  2. # city country
  3. # 1 paris france
  4. # 2 miami united states
  5. # 3 new york united states
  6. # 4 aix-en-provence france
  7. # 5 auckland new_zealand

tidyr

  1. library(tidyr)
  2. separate_wider_delim(locs, locations, delim = "_", names = c("city", "country"), too_many = "merge")
  3. # # A tibble: 5 × 2
  4. # city country
  5. # <chr> <chr>
  6. # 1 paris france
  7. # 2 miami united states
  8. # 3 new york united states
  9. # 4 aix-en-provence france
  10. # 5 auckland new_zealand

Data

  1. locs <- structure(list(locations = c("paris_france", "miami_united states", "new york_united states", "aix-en-provence_france", "auckland_new_zealand")), row.names = c(NA, -5L), class = "data.frame")
英文:

Because of new_zealand, we have to take a little extra caution.

base R

  1. strcapture(&quot;^([^_]+)_(.*)$&quot;, locs$locations, proto = c(city=&quot;&quot;, country=&quot;&quot;))
  2. # city country
  3. # 1 paris france
  4. # 2 miami united states
  5. # 3 new york united states
  6. # 4 aix-en-provence france
  7. # 5 auckland new_zealand

tidyr

  1. library(tidyr)
  2. separate_wider_delim(locs, locations, delim = &quot;_&quot;, names = c(&quot;city&quot;, &quot;country&quot;), too_many = &quot;merge&quot;)
  3. # # A tibble: 5 &#215; 2
  4. # city country
  5. # &lt;chr&gt; &lt;chr&gt;
  6. # 1 paris france
  7. # 2 miami united states
  8. # 3 new york united states
  9. # 4 aix-en-provence france
  10. # 5 auckland new_zealand

Data

  1. locs &lt;- structure(list(locations = c(&quot;paris_france&quot;, &quot;miami_united states&quot;, &quot;new york_united states&quot;, &quot;aix-en-provence_france&quot;, &quot;auckland_new_zealand&quot;)), row.names = c(NA, -5L), class = &quot;data.frame&quot;)

huangapple
  • 本文由 发表于 2023年6月8日 21:40:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76432465.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定