将R中的列根据列值拆分为多个列

huangapple go评论64阅读模式
英文:

Split column in R to several columns based on column value

问题

我想在df中引入新列,基于一个列的值。

需要将category列按','拆分,并将每个拆分的子串放入新引入的适当列。

最小示例:

df <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
                 othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))

所需df:

finaldf <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
                      category1=c('1', '1',NA, NA),
                      category2=c('2',NA,NA, '2'),
                      category3=c(NA,'3','3', '3'),
                 othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))

category1、category2和category3列将根据category列的值取值,否则为NA。

英文:

I want to introduce new cols in df based on one column's values.

category col needs to be split on ',' and each split substring needs to go to newly introduced appropriate col.

Minimal example:

df &lt;- data.frame(category=c(&#39;1, 2&#39;, &#39;1, 3&#39;,&#39;3&#39;, &#39;2, 3&#39;),
                 othercolumn= c(&quot;Grealish&quot;, &quot;Saka&quot;, &quot;Henry&quot;, &#39;Jesus&#39;))

Required df:

finaldf &lt;- data.frame(category=c(&#39;1, 2&#39;, &#39;1, 3&#39;,&#39;3&#39;, &#39;2, 3&#39;),
                      category1=c(&#39;1&#39;, &#39;1&#39;,NA, NA),
                      category2=c(&#39;2&#39;,NA,NA, &#39;2&#39;),
                      category3=c(NA,&#39;3&#39;,&#39;3&#39;, &#39;3&#39;),
                 othercolumn= c(&quot;Grealish&quot;, &quot;Saka&quot;, &quot;Henry&quot;, &#39;Jesus&#39;))

category1, category2, and category3 cols will take values based on the category column, otherwise NA.

答案1

得分: 2

你可以使用separate然后pivot_wider

library(tidyverse)

tibble(df) |&gt; 
  separate_longer_delim(category, delim = &quot;, &quot;) |&gt; 
  pivot_wider(names_from = category, values_from = category, names_prefix = &quot;category&quot;) |&gt; 
  select(starts_with(&#39;category&#39;), othercolumn)

一个tibble: 4 × 4

category1 category2 category3 othercolumn

1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus



<details>
<summary>英文:</summary>

You can use `separate` and then `pivot_wider`:

library(tidyverse)

tibble(df) |>
separate_longer_delim(category, delim = ", ") |>
pivot_wider(names_from = category, values_from = category, names_prefix = "category") |>
select(starts_with('category'), othercolumn)

A tibble: 4 × 4

category1 category2 category3 othercolumn
<chr> <chr> <chr> <chr>
1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus


</details>



# 答案2
**得分**: 2

在基本的R中,您可以执行以下操作:

使用`strsplt`来分割数字并将其转换为数值

在`lapply`中设置`setName`并使用`cbind`将其转换为数据框
```R
nums <- strsplit(df$category, ",") |> 
  type.convert(as.is = TRUE)

cbind(df[1], do.call(rbind, lapply(nums, function(x){
    sq <- seq(max(unlist(nums)))
    ifelse(sq %in% x , sq, NA ) |> 
      setNames(paste0("category", sq))
  })), df[-1])
  category category1 category2 category3 othercolumn
1     1, 2         1         2        NA    Grealish
2     1, 3         1        NA         3        Saka
3        3        NA        NA         3       Henry
4     2, 3        NA         2         3       Jesus
英文:

In base R you can do:

Using strsplt to split the numbers and convert it to numeric

setName the return in lapply and cbind to data frame

nums &lt;- strsplit(df$category, &quot;,&quot;) |&gt; 
  type.convert(as.is = T)

cbind(df[1], do.call(rbind, lapply(nums, \(x){
    sq &lt;- seq(max(unlist(nums)))
    ifelse(sq %in% x , sq, NA ) |&gt; 
      setNames(paste0(&quot;category&quot;, sq))
  })), df[-1])

  category category1 category2 category3 othercolumn
1     1, 2         1         2        NA    Grealish
2     1, 3         1        NA         3        Saka
3        3        NA        NA         3       Henry
4     2, 3        NA         2         3       Jesus

答案3

得分: 2

我们可以使用fastDummies来创建虚拟列,然后修改值

library(fastDummies)
library(dplyr)
df %>%
 dummy_cols('category', split = ",\\s*") %>%
 mutate(across(starts_with('category_'), ~ (NA^!.x) *
     readr::parse_number(cur_column()))) %>%
 relocate(othercolumn, .after = last_col())

-输出

  category category_1 category_2 category_3 othercolumn
1     1, 2          1          2         NA    Grealish
2     1, 3          1         NA          3        Saka
3        3         NA         NA          3       Henry
4     2, 3         NA          2          3       Jesus
英文:

We may use fastDummies to create dummy columns and then modify the values

library(fastDummies)
library(dplyr)
df %&gt;% 
 dummy_cols(&#39;category&#39;, split = &quot;,\\s*&quot;) %&gt;% 
 mutate(across(starts_with(&#39;category_&#39;), ~ (NA^!.x) * 
     readr::parse_number(cur_column()))) %&gt;%
 relocate(othercolumn, .after = last_col())

-output

  category category_1 category_2 category_3 othercolumn
1     1, 2          1          2         NA    Grealish
2     1, 3          1         NA          3        Saka
3        3         NA         NA          3       Henry
4     2, 3         NA          2          3       Jesus

答案4

得分: 2

以下是使用 data.table 中的 dcast 的选项:

dcast(
  setDT(df)[
    , strsplit(category, ", "),
    .(category, othercolumn)
  ][
    ,
    V2 := paste0("category_", V1)
  ],
  category + othercolumn ~ V2,
  value.var = "V1"
)

这将产生以下结果:

   category othercolumn category_1 category_2 category_3
1:     1, 2    Grealish          1          2       <NA>
2:     1, 3        Saka          1       <NA>          3
3:     2, 3       Jesus       <NA>          2          3
4:        3       Henry       <NA>       <NA>          3

希望这对你有所帮助。

英文:

Here is a data.table option using dcast

dcast(
  setDT(df)[
    , strsplit(category, &quot;, &quot;),
    .(category, othercolumn)
  ][
    ,
    V2 := paste0(&quot;category_&quot;, V1)
  ],
  category + othercolumn ~ V2,
  value.var = &quot;V1&quot;
)

which gives

   category othercolumn category_1 category_2 category_3
1:     1, 2    Grealish          1          2       &lt;NA&gt;
2:     1, 3        Saka          1       &lt;NA&gt;          3
3:     2, 3       Jesus       &lt;NA&gt;          2          3
4:        3       Henry       &lt;NA&gt;       &lt;NA&gt;          3

huangapple
  • 本文由 发表于 2023年4月20日 07:04:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定