英文:
Split column in R to several columns based on column value
问题
我想在df中引入新列,基于一个列的值。
需要将category列按','拆分,并将每个拆分的子串放入新引入的适当列。
最小示例:
df <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))
所需df:
finaldf <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
category1=c('1', '1',NA, NA),
category2=c('2',NA,NA, '2'),
category3=c(NA,'3','3', '3'),
othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))
category1、category2和category3列将根据category列的值取值,否则为NA。
英文:
I want to introduce new cols in df based on one column's values.
category col needs to be split on ',' and each split substring needs to go to newly introduced appropriate col.
Minimal example:
df <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))
Required df:
finaldf <- data.frame(category=c('1, 2', '1, 3','3', '2, 3'),
category1=c('1', '1',NA, NA),
category2=c('2',NA,NA, '2'),
category3=c(NA,'3','3', '3'),
othercolumn= c("Grealish", "Saka", "Henry", 'Jesus'))
category1, category2, and category3 cols will take values based on the category column, otherwise NA.
答案1
得分: 2
你可以使用separate
然后pivot_wider
:
library(tidyverse)
tibble(df) |>
separate_longer_delim(category, delim = ", ") |>
pivot_wider(names_from = category, values_from = category, names_prefix = "category") |>
select(starts_with('category'), othercolumn)
一个tibble: 4 × 4
category1 category2 category3 othercolumn
1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus
<details>
<summary>英文:</summary>
You can use `separate` and then `pivot_wider`:
library(tidyverse)
tibble(df) |>
separate_longer_delim(category, delim = ", ") |>
pivot_wider(names_from = category, values_from = category, names_prefix = "category") |>
select(starts_with('category'), othercolumn)
A tibble: 4 × 4
category1 category2 category3 othercolumn
<chr> <chr> <chr> <chr>
1 1 2 NA Grealish
2 1 NA 3 Saka
3 NA NA 3 Henry
4 NA 2 3 Jesus
</details>
# 答案2
**得分**: 2
在基本的R中,您可以执行以下操作:
使用`strsplt`来分割数字并将其转换为数值
在`lapply`中设置`setName`并使用`cbind`将其转换为数据框
```R
nums <- strsplit(df$category, ",") |>
type.convert(as.is = TRUE)
cbind(df[1], do.call(rbind, lapply(nums, function(x){
sq <- seq(max(unlist(nums)))
ifelse(sq %in% x , sq, NA ) |>
setNames(paste0("category", sq))
})), df[-1])
category category1 category2 category3 othercolumn
1 1, 2 1 2 NA Grealish
2 1, 3 1 NA 3 Saka
3 3 NA NA 3 Henry
4 2, 3 NA 2 3 Jesus
英文:
In base R you can do:
Using strsplt
to split the numbers and convert it to numeric
setName
the return in lapply
and cbind
to data frame
nums <- strsplit(df$category, ",") |>
type.convert(as.is = T)
cbind(df[1], do.call(rbind, lapply(nums, \(x){
sq <- seq(max(unlist(nums)))
ifelse(sq %in% x , sq, NA ) |>
setNames(paste0("category", sq))
})), df[-1])
category category1 category2 category3 othercolumn
1 1, 2 1 2 NA Grealish
2 1, 3 1 NA 3 Saka
3 3 NA NA 3 Henry
4 2, 3 NA 2 3 Jesus
答案3
得分: 2
我们可以使用fastDummies
来创建虚拟列,然后修改值
library(fastDummies)
library(dplyr)
df %>%
dummy_cols('category', split = ",\\s*") %>%
mutate(across(starts_with('category_'), ~ (NA^!.x) *
readr::parse_number(cur_column()))) %>%
relocate(othercolumn, .after = last_col())
-输出
category category_1 category_2 category_3 othercolumn
1 1, 2 1 2 NA Grealish
2 1, 3 1 NA 3 Saka
3 3 NA NA 3 Henry
4 2, 3 NA 2 3 Jesus
英文:
We may use fastDummies
to create dummy columns and then modify the values
library(fastDummies)
library(dplyr)
df %>%
dummy_cols('category', split = ",\\s*") %>%
mutate(across(starts_with('category_'), ~ (NA^!.x) *
readr::parse_number(cur_column()))) %>%
relocate(othercolumn, .after = last_col())
-output
category category_1 category_2 category_3 othercolumn
1 1, 2 1 2 NA Grealish
2 1, 3 1 NA 3 Saka
3 3 NA NA 3 Henry
4 2, 3 NA 2 3 Jesus
答案4
得分: 2
以下是使用 data.table
中的 dcast
的选项:
dcast(
setDT(df)[
, strsplit(category, ", "),
.(category, othercolumn)
][
,
V2 := paste0("category_", V1)
],
category + othercolumn ~ V2,
value.var = "V1"
)
这将产生以下结果:
category othercolumn category_1 category_2 category_3
1: 1, 2 Grealish 1 2 <NA>
2: 1, 3 Saka 1 <NA> 3
3: 2, 3 Jesus <NA> 2 3
4: 3 Henry <NA> <NA> 3
希望这对你有所帮助。
英文:
Here is a data.table
option using dcast
dcast(
setDT(df)[
, strsplit(category, ", "),
.(category, othercolumn)
][
,
V2 := paste0("category_", V1)
],
category + othercolumn ~ V2,
value.var = "V1"
)
which gives
category othercolumn category_1 category_2 category_3
1: 1, 2 Grealish 1 2 <NA>
2: 1, 3 Saka 1 <NA> 3
3: 2, 3 Jesus <NA> 2 3
4: 3 Henry <NA> <NA> 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论