如何通过分隔符将单列拆分为多列(分隔符数量不同)

huangapple go评论74阅读模式
英文:

How to split one column into multiple columns by delimiter (with different numbers of delimiter)

问题

我有一个类似这样的数据框:

continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem


我想将`country`列中的值分隔到多个列中,每个国家对应一个列。最终结果应该如下:

continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal


使用 `separate_wider_delim()` 不起作用,因为并非每个大洲都有相同数量的国家,因此原始列中的分隔符数量也不同。

提前致谢
英文:

I have a dataframe like this one:

continent &lt;- c(&quot;Europe&quot;, &quot;Asia&quot;)
country &lt;- c(&quot;France;Germany;Italy&quot;, &quot;Japan&quot;)
start_problem &lt;- data.frame(continent, country)
start_problem

I would like to seperate the values in the column country to multiple columns, one for every country. The end product should look like:

continent &lt;- c(&quot;Europe&quot;, &quot;Asia&quot;)
country1 &lt;- c(&quot;France&quot;, &quot;Japan&quot;)
country2 &lt;- c(&quot;Germany&quot;, NA)
country3 &lt;- c(&quot;Italy&quot;, NA)
goal &lt;- data.frame(continent, country1, country2, country3)
goal

Using separate_wider_delim() does not work since not every continent has the same amount of countries, thus not the same amount of delimiters in the original column.

Thanks in advance

答案1

得分: 1

我们可以首先通过找到分隔符;的最大出现次数来确定需要多少列。然后将这个信息粘贴到separate函数的into = 参数中,与字符串"country"一起使用。

library(tidyverse)

col_number <- max(str_count(start_problem$country, ";") + 1)

start_problem %>% separate(country, 
                           into = paste0("country", seq_len(col_number)), 
                           sep = ";")

  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     <NA>     <NA>
英文:

We can first find out how many columns are needed by finding the max number of occurrence of the delimiter ;. Then paste that information in the into = parameter of separate with the "country" string.

library(tidyverse)

col_number &lt;- max(str_count(start_problem$country, &quot;;&quot;) + 1)

start_problem %&gt;% separate(country, 
                           into = paste0(&quot;country&quot;, seq_len(col_number)), 
                           sep = &quot;;&quot;)

  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     &lt;NA&gt;     &lt;NA&gt;

答案2

得分: 1

另一个选项是首先使用separate_rows将行分隔开。创建一个列,其中包含要在pivot_wider中使用的名称,以使数据变得更宽,如下所示:

library(tidyverse)
start_problem %>%
  separate_rows(country, sep = ";") %>%
  mutate(col_name = paste0("country", row_number()), .by = continent) %>%
  pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#>   continent country1 country2 country3
#>   <chr>     <chr>    <chr>    <chr>   
#> 1 Europe    France   Germany  Italy   
#> 2 Asia      Japan    <NA>     <NA>

创建于2023年3月31日,使用reprex v2.0.2

英文:

Another option by first separating the rows with separate_rows. Create a column with the names to use for pivot_wider to make your data wider like this:

library(tidyverse)
start_problem %&gt;%
  separate_rows(country, sep = &quot;;&quot;) %&gt;%
  mutate(col_name = paste0(&quot;country&quot;, row_number()), .by = continent) %&gt;%
  pivot_wider(names_from = col_name, values_from = country)
#&gt; # A tibble: 2 &#215; 4
#&gt;   continent country1 country2 country3
#&gt;   &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;   
#&gt; 1 Europe    France   Germany  Italy   
#&gt; 2 Asia      Japan    &lt;NA&gt;     &lt;NA&gt;

<sup>Created on 2023-03-31 with reprex v2.0.2</sup>

答案3

得分: 1

在基本R中:

    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本              

如果你严格地想要 `NA`,那么可以使用:

    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = '&#39;&#39;))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本    &lt;NA&gt;  &lt;NA&gt;
英文:

in Base R:

cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan              

if you strictly want NA then use

cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = &#39;&#39;))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan    &lt;NA&gt;  &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年3月31日 16:40:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896504.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定