英文:
How to split one column into multiple columns by delimiter (with different numbers of delimiter)
问题
我有一个类似这样的数据框:
continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem
我想将`country`列中的值分隔到多个列中,每个国家对应一个列。最终结果应该如下:
continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal
使用 `separate_wider_delim()` 不起作用,因为并非每个大洲都有相同数量的国家,因此原始列中的分隔符数量也不同。
提前致谢
英文:
I have a dataframe like this one:
continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem
I would like to seperate the values in the column country to multiple columns, one for every country. The end product should look like:
continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal
Using separate_wider_delim() does not work since not every continent has the same amount of countries, thus not the same amount of delimiters in the original column.
Thanks in advance
答案1
得分: 1
我们可以首先通过找到分隔符;的最大出现次数来确定需要多少列。然后将这个信息粘贴到separate函数的into = 参数中,与字符串"country"一起使用。
library(tidyverse)
col_number <- max(str_count(start_problem$country, ";") + 1)
start_problem %>% separate(country, 
                           into = paste0("country", seq_len(col_number)), 
                           sep = ";")
  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     <NA>     <NA>
英文:
We can first find out how many columns are needed by finding the max number of occurrence of the delimiter ;. Then paste that information in the into =  parameter of separate with the "country" string.
library(tidyverse)
col_number <- max(str_count(start_problem$country, ";") + 1)
start_problem %>% separate(country, 
                           into = paste0("country", seq_len(col_number)), 
                           sep = ";")
  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     <NA>     <NA>
答案2
得分: 1
另一个选项是首先使用separate_rows将行分隔开。创建一个列,其中包含要在pivot_wider中使用的名称,以使数据变得更宽,如下所示:
library(tidyverse)
start_problem %>%
  separate_rows(country, sep = ";") %>%
  mutate(col_name = paste0("country", row_number()), .by = continent) %>%
  pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#>   continent country1 country2 country3
#>   <chr>     <chr>    <chr>    <chr>   
#> 1 Europe    France   Germany  Italy   
#> 2 Asia      Japan    <NA>     <NA>
创建于2023年3月31日,使用reprex v2.0.2
英文:
Another option by first separating the rows with separate_rows. Create a column with the names to use for pivot_wider to make your data wider like this:
library(tidyverse)
start_problem %>%
  separate_rows(country, sep = ";") %>%
  mutate(col_name = paste0("country", row_number()), .by = continent) %>%
  pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#>   continent country1 country2 country3
#>   <chr>     <chr>    <chr>    <chr>   
#> 1 Europe    France   Germany  Italy   
#> 2 Asia      Japan    <NA>     <NA>
<sup>Created on 2023-03-31 with reprex v2.0.2</sup>
答案3
得分: 1
在基本R中:
    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本              
如果你严格地想要 `NA`,那么可以使用:
    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = '''))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本    <NA>  <NA>
英文:
in Base R:
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan              
if you strictly want NA then use
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = ''))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan    <NA>  <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论