英文:
How to split one column into multiple columns by delimiter (with different numbers of delimiter)
问题
我有一个类似这样的数据框:
continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem
我想将`country`列中的值分隔到多个列中,每个国家对应一个列。最终结果应该如下:
continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal
使用 `separate_wider_delim()` 不起作用,因为并非每个大洲都有相同数量的国家,因此原始列中的分隔符数量也不同。
提前致谢
英文:
I have a dataframe like this one:
continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem
I would like to seperate the values in the column country
to multiple columns, one for every country. The end product should look like:
continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal
Using separate_wider_delim()
does not work since not every continent has the same amount of countries, thus not the same amount of delimiters in the original column.
Thanks in advance
答案1
得分: 1
我们可以首先通过找到分隔符;
的最大出现次数来确定需要多少列。然后将这个信息粘贴到separate
函数的into =
参数中,与字符串"country"一起使用。
library(tidyverse)
col_number <- max(str_count(start_problem$country, ";") + 1)
start_problem %>% separate(country,
into = paste0("country", seq_len(col_number)),
sep = ";")
continent country1 country2 country3
1 Europe France Germany Italy
2 Asia Japan <NA> <NA>
英文:
We can first find out how many columns are needed by finding the max
number of occurrence of the delimiter ;
. Then paste
that information in the into =
parameter of separate
with the "country" string.
library(tidyverse)
col_number <- max(str_count(start_problem$country, ";") + 1)
start_problem %>% separate(country,
into = paste0("country", seq_len(col_number)),
sep = ";")
continent country1 country2 country3
1 Europe France Germany Italy
2 Asia Japan <NA> <NA>
答案2
得分: 1
另一个选项是首先使用separate_rows
将行分隔开。创建一个列,其中包含要在pivot_wider
中使用的名称,以使数据变得更宽,如下所示:
library(tidyverse)
start_problem %>%
separate_rows(country, sep = ";") %>%
mutate(col_name = paste0("country", row_number()), .by = continent) %>%
pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#> continent country1 country2 country3
#> <chr> <chr> <chr> <chr>
#> 1 Europe France Germany Italy
#> 2 Asia Japan <NA> <NA>
创建于2023年3月31日,使用reprex v2.0.2
英文:
Another option by first separating the rows with separate_rows
. Create a column with the names to use for pivot_wider
to make your data wider like this:
library(tidyverse)
start_problem %>%
separate_rows(country, sep = ";") %>%
mutate(col_name = paste0("country", row_number()), .by = continent) %>%
pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#> continent country1 country2 country3
#> <chr> <chr> <chr> <chr>
#> 1 Europe France Germany Italy
#> 2 Asia Japan <NA> <NA>
<sup>Created on 2023-03-31 with reprex v2.0.2</sup>
答案3
得分: 1
在基本R中:
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
continent V1 V2 V3
1 欧洲 法国 德国 意大利
2 亚洲 日本
如果你严格地想要 `NA`,那么可以使用:
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = '''))
continent V1 V2 V3
1 欧洲 法国 德国 意大利
2 亚洲 日本 <NA> <NA>
英文:
in Base R:
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
continent V1 V2 V3
1 Europe France Germany Italy
2 Asia Japan
if you strictly want NA
then use
cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = ''))
continent V1 V2 V3
1 Europe France Germany Italy
2 Asia Japan <NA> <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论