2023年4月20日 07:04:23go评论77阅读模式

英文:

Split column in R to several columns based on column value

问题

我想在df中引入新列，基于一个列的值。

需要将category列按','拆分，并将每个拆分的子串放入新引入的适当列。

最小示例：

df <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
                 othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))

所需df：

finaldf <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
                      category1=c('1', '1',NA, NA),
                      category2=c('2',NA,NA, '2'),
                      category3=c(NA,'3','3', '3'),
                 othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))

category1、category2和category3列将根据category列的值取值，否则为NA。

英文:

I want to introduce new cols in df based on one column's values.

category col needs to be split on ',' and each split substring needs to go to newly introduced appropriate col.

Minimal example:

df &lt;- data.frame(category=c(&#39;1, 2&#39;, &#39;1, 3&#39;,&#39;3&#39;, &#39;2, 3&#39;),
                 othercolumn= c(&quot;Grealish&quot;, &quot;Saka&quot;, &quot;Henry&quot;, &#39;Jesus&#39;))

Required df:

finaldf &lt;- data.frame(category=c(&#39;1, 2&#39;, &#39;1, 3&#39;,&#39;3&#39;, &#39;2, 3&#39;),
                      category1=c(&#39;1&#39;, &#39;1&#39;,NA, NA),
                      category2=c(&#39;2&#39;,NA,NA, &#39;2&#39;),
                      category3=c(NA,&#39;3&#39;,&#39;3&#39;, &#39;3&#39;),
                 othercolumn= c(&quot;Grealish&quot;, &quot;Saka&quot;, &quot;Henry&quot;, &#39;Jesus&#39;))

category1, category2, and category3 cols will take values based on the category column, otherwise NA.

答案1

得分: 2

你可以使用separate然后pivot_wider：

library(tidyverse)

tibble(df) |&gt; 
  separate_longer_delim(category, delim = &quot;, &quot;) |&gt; 
  pivot_wider(names_from = category, values_from = category, names_prefix = &quot;category&quot;) |&gt; 
  select(starts_with(&#39;category&#39;), othercolumn)

一个tibble: 4 × 4

category1 category2 category3 othercolumn

1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus



<details>
<summary>英文:</summary>

You can use `separate` and then `pivot_wider`:

library(tidyverse)

tibble(df) |>
separate_longer_delim(category, delim = ", ") |>
pivot_wider(names_from = category, values_from = category, names_prefix = "category") |>
select(starts_with('category'), othercolumn)

A tibble: 4 × 4

category1 category2 category3 othercolumn
<chr> <chr> <chr> <chr>
1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus


</details>



# 答案2
**得分**: 2

在基本的R中，您可以执行以下操作：

使用`strsplt`来分割数字并将其转换为数值

在`lapply`中设置`setName`并使用`cbind`将其转换为数据框
```R
nums <- strsplit(df$category, ",") |> 
  type.convert(as.is = TRUE)

cbind(df[1], do.call(rbind, lapply(nums, function(x){
    sq <- seq(max(unlist(nums)))
    ifelse(sq %in% x , sq, NA ) |> 
      setNames(paste0("category", sq))
  })), df[-1])

  category category1 category2 category3 othercolumn
1     1, 2         1         2        NA    Grealish
2     1, 3         1        NA         3        Saka
3        3        NA        NA         3       Henry
4     2, 3        NA         2         3       Jesus

英文:

In base R you can do:

Using strsplt to split the numbers and convert it to numeric

setName the return in lapply and cbind to data frame

nums &lt;- strsplit(df$category, &quot;,&quot;) |&gt; 
  type.convert(as.is = T)

cbind(df[1], do.call(rbind, lapply(nums, \(x){
    sq &lt;- seq(max(unlist(nums)))
    ifelse(sq %in% x , sq, NA ) |&gt; 
      setNames(paste0(&quot;category&quot;, sq))
  })), df[-1])

  category category1 category2 category3 othercolumn
1     1, 2         1         2        NA    Grealish
2     1, 3         1        NA         3        Saka
3        3        NA        NA         3       Henry
4     2, 3        NA         2         3       Jesus

答案3

得分: 2

我们可以使用fastDummies来创建虚拟列，然后修改值

library(fastDummies)
library(dplyr)
df %>%
 dummy_cols('category', split = ",\\s*") %>%
 mutate(across(starts_with('category_'), ~ (NA^!.x) *
     readr::parse_number(cur_column()))) %>%
 relocate(othercolumn, .after = last_col())

-输出

  category category_1 category_2 category_3 othercolumn
1     1, 2          1          2         NA    Grealish
2     1, 3          1         NA          3        Saka
3        3         NA         NA          3       Henry
4     2, 3         NA          2          3       Jesus

英文:

We may use fastDummies to create dummy columns and then modify the values

library(fastDummies)
library(dplyr)
df %&gt;% 
 dummy_cols(&#39;category&#39;, split = &quot;,\\s*&quot;) %&gt;% 
 mutate(across(starts_with(&#39;category_&#39;), ~ (NA^!.x) * 
     readr::parse_number(cur_column()))) %&gt;%
 relocate(othercolumn, .after = last_col())

-output

  category category_1 category_2 category_3 othercolumn
1     1, 2          1          2         NA    Grealish
2     1, 3          1         NA          3        Saka
3        3         NA         NA          3       Henry
4     2, 3         NA          2          3       Jesus

答案4

得分: 2

以下是使用 data.table 中的 dcast 的选项：

dcast(
  setDT(df)[
    , strsplit(category, ", "),
    .(category, othercolumn)
  ][
    ,
    V2 := paste0("category_", V1)
  ],
  category + othercolumn ~ V2,
  value.var = "V1"
)

这将产生以下结果：

   category othercolumn category_1 category_2 category_3
1:     1, 2    Grealish          1          2       <NA>
2:     1, 3        Saka          1       <NA>          3
3:     2, 3       Jesus       <NA>          2          3
4:        3       Henry       <NA>       <NA>          3

希望这对你有所帮助。

英文:

Here is a data.table option using dcast

dcast(
  setDT(df)[
    , strsplit(category, &quot;, &quot;),
    .(category, othercolumn)
  ][
    ,
    V2 := paste0(&quot;category_&quot;, V1)
  ],
  category + othercolumn ~ V2,
  value.var = &quot;V1&quot;
)

which gives

   category othercolumn category_1 category_2 category_3
1:     1, 2    Grealish          1          2       &lt;NA&gt;
2:     1, 3        Saka          1       &lt;NA&gt;          3
3:     2, 3       Jesus       &lt;NA&gt;          2          3
4:        3       Henry       &lt;NA&gt;       &lt;NA&gt;          3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将R中的列根据列值拆分为多个列

问题

答案1

一个tibble: 4 × 4

A tibble: 4 × 4

答案3

答案4

In the R library quantmod, is there a new problem with getQuote?

如何用同一行中的列值替换列表中的列名？

R: save a regex match to a new variable while removing the regex match from the existing variable using `str_extract()`

如何将一个DataFrame的列值添加到另一个DataFrame的另一列中

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论