2023年3月31日 16:40:45go评论88阅读模式

英文:

How to split one column into multiple columns by delimiter (with different numbers of delimiter)

问题

我有一个类似这样的数据框：

continent <- c("Europe", "Asia")
country <- c("France;Germany;Italy", "Japan")
start_problem <- data.frame(continent, country)
start_problem


我想将`country`列中的值分隔到多个列中，每个国家对应一个列。最终结果应该如下：

continent <- c("Europe", "Asia")
country1 <- c("France", "Japan")
country2 <- c("Germany", NA)
country3 <- c("Italy", NA)
goal <- data.frame(continent, country1, country2, country3)
goal


使用 `separate_wider_delim()` 不起作用，因为并非每个大洲都有相同数量的国家，因此原始列中的分隔符数量也不同。

提前致谢

英文:

I have a dataframe like this one:

continent &lt;- c(&quot;Europe&quot;, &quot;Asia&quot;)
country &lt;- c(&quot;France;Germany;Italy&quot;, &quot;Japan&quot;)
start_problem &lt;- data.frame(continent, country)
start_problem

I would like to seperate the values in the column country to multiple columns, one for every country. The end product should look like:

continent &lt;- c(&quot;Europe&quot;, &quot;Asia&quot;)
country1 &lt;- c(&quot;France&quot;, &quot;Japan&quot;)
country2 &lt;- c(&quot;Germany&quot;, NA)
country3 &lt;- c(&quot;Italy&quot;, NA)
goal &lt;- data.frame(continent, country1, country2, country3)
goal

Using separate_wider_delim() does not work since not every continent has the same amount of countries, thus not the same amount of delimiters in the original column.

Thanks in advance

答案1

得分: 1

我们可以首先通过找到分隔符;的最大出现次数来确定需要多少列。然后将这个信息粘贴到separate函数的into = 参数中，与字符串"country"一起使用。

library(tidyverse)

col_number <- max(str_count(start_problem$country, ";") + 1)

start_problem %>% separate(country, 
                           into = paste0("country", seq_len(col_number)), 
                           sep = ";")

  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     <NA>     <NA>

英文:

We can first find out how many columns are needed by finding the max number of occurrence of the delimiter ;. Then paste that information in the into = parameter of separate with the "country" string.

library(tidyverse)

col_number &lt;- max(str_count(start_problem$country, &quot;;&quot;) + 1)

start_problem %&gt;% separate(country, 
                           into = paste0(&quot;country&quot;, seq_len(col_number)), 
                           sep = &quot;;&quot;)

  continent country1 country2 country3
1    Europe   France  Germany    Italy
2      Asia    Japan     &lt;NA&gt;     &lt;NA&gt;

答案2

得分: 1

另一个选项是首先使用separate_rows将行分隔开。创建一个列，其中包含要在pivot_wider中使用的名称，以使数据变得更宽，如下所示：

library(tidyverse)
start_problem %>%
  separate_rows(country, sep = ";") %>%
  mutate(col_name = paste0("country", row_number()), .by = continent) %>%
  pivot_wider(names_from = col_name, values_from = country)
#> # A tibble: 2 × 4
#>   continent country1 country2 country3
#>   <chr>     <chr>    <chr>    <chr>   
#> 1 Europe    France   Germany  Italy   
#> 2 Asia      Japan    <NA>     <NA>

^{创建于2023年3月31日，使用reprex v2.0.2}

英文:

Another option by first separating the rows with separate_rows. Create a column with the names to use for pivot_wider to make your data wider like this:

library(tidyverse)
start_problem %&gt;%
  separate_rows(country, sep = &quot;;&quot;) %&gt;%
  mutate(col_name = paste0(&quot;country&quot;, row_number()), .by = continent) %&gt;%
  pivot_wider(names_from = col_name, values_from = country)
#&gt; # A tibble: 2 &#215; 4
#&gt;   continent country1 country2 country3
#&gt;   &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;   
#&gt; 1 Europe    France   Germany  Italy   
#&gt; 2 Asia      Japan    &lt;NA&gt;     &lt;NA&gt;

<sup>Created on 2023-03-31 with reprex v2.0.2</sup>

答案3

得分: 1

在基本R中：

    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本              

如果你严格地想要 `NA`，那么可以使用：

    cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = '&#39;&#39;))
      continent     V1      V2    V3
    1    欧洲     法国     德国   意大利
    2    亚洲     日本    &lt;NA&gt;  &lt;NA&gt;

英文:

in Base R:

cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan

if you strictly want NA then use

cbind(start_problem[1], read.csv2(text=start_problem[,2], header = FALSE, na.strings = &#39;&#39;))
  continent     V1      V2    V3
1    Europe France Germany Italy
2      Asia  Japan    &lt;NA&gt;  &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何通过分隔符将单列拆分为多列（分隔符数量不同）

问题

答案1

答案2

答案3

如何按自定义顺序排列数据框列中的字符向量？

制作一个堆叠的ggplot条形图，使用非分类值。

从X和Y列中删除重复的坐标。

从多次点击中获取绘图的x和y轴位置

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论