英文:
Error while splitting into new row with comma as delimiter
问题
我有以下的数据框:
temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")
我想要将以逗号分隔的每个项拆分为新的行,并创建新的组合。例如,最后一行应该创建6个额外的新行。然而,我遇到了一个我不理解的错误。
temp %>%
separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
data.frame(stringsAsFactors = FALSE)
错误消息是:
Error in `fn()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_error()` to see where the error occurred.
然而,错误似乎来自第3行,因为前两行正常工作。
temp[1:2, ] %>%
separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
data.frame(stringsAsFactors = FALSE)
有人知道问题是什么吗?
英文:
I have the following dataframe
temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"
), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")
pid LEFT_GENE RIGHT_GENE
1 s1 PTPRO
2 s1 EPS8 FOx, D
3 s1 DPY19L2,AC084357.2,AC027667.1 DPY19L2P2,S100A11P1
I want to split each item delimited with a comma into a new row and create new combination.
For example, the last row should create 6 new additional rows. However I'm getting this error I don't understand.
temp %>%
separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
data.frame ( stringsAsFactors = F)
Error in `fn()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_error()` to see where the error occurred.
However the error seems to be coming from row 3 since rows 1:2 works fine
> temp[1:2,
+ ] %>%
+ separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
+ data.frame ( stringsAsFactors = F)
pid LEFT_GENE RIGHT_GENE
1 s1 PTPRO
2 s1 EPS8 FOx
3 s1 EPS8 D
Does anyone know what the issue is?
答案1
得分: 3
你只能一次分开一列
temp %>%
separate_rows(RIGHT_GENE)%>%
separate_rows(LEFT_GENE)
# A tibble: 9 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO ""
2 s1 EPS8 "FOx"
3 s1 EPS8 "D"
4 s1 DPY19L2 "DPY19L2P2"
5 s1 AC084357.2 "DPY19L2P2"
6 s1 AC027667.1 "DPY19L2P2"
7 s1 DPY19L2 "S100A11P1"
8 s1 AC084357.2 "S100A11P1"
9 s1 AC027667.1 "S100A11P1"
英文:
You can only separate one column at a time
temp %>%
separate_rows(RIGHT_GENE)%>%
separate_rows(LEFT_GENE)
# A tibble: 9 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO ""
2 s1 EPS8 "FOx"
3 s1 EPS8 "D"
4 s1 DPY19L2 "DPY19L2P2"
5 s1 AC084357.2 "DPY19L2P2"
6 s1 AC027667.1 "DPY19L2P2"
7 s1 DPY19L2 "S100A11P1"
8 s1 AC084357.2 "S100A11P1"
9 s1 AC027667.1 "S100A11P1"
答案2
得分: 1
如果我们需要6行,一个选项是
library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
temp %>%
mutate(across(ends_with("_GENE"), ~ strsplit(.x, split = ",")),
cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %>%
mutate(across(ends_with("_GENE"),
~ map2(.x, cnt, ~ `length<-`(.x, .y))) %>%
select(-cnt) %>%
unnest_longer(where(is.list))
-输出
# A tibble: 6 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO <NA>
2 s1 EPS8 FOx
3 s1 <NA> D
4 s1 DPY19L2 DPY19L2P2
5 s1 AC084357.2 S100A11P1
6 s1 AC027667.1 <NA>
如果NA
应该被前一个非NA
替代,请在末尾添加fill
:
...
%>% fill(ends_with("_GENE"))
# A tibble: 6 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO <NA>
2 s1 EPS8 FOx
3 s1 EPS8 D
4 s1 DPY19L2 DPY19L2P2
5 s1 AC084357.2 S100A11P1
6 s1 AC027667.1 S100A11P1
英文:
If we need 6 rows, an option is
library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
temp %>%
mutate(across(ends_with("_GENE"), ~ strsplit(.x, split = ",")),
cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %>%
mutate(across(ends_with("_GENE"),
~ map2(.x, cnt, ~ `length<-`(.x, .y)))) %>%
select(-cnt) %>%
unnest_longer(where(is.list))
-output
# A tibble: 6 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO <NA>
2 s1 EPS8 FOx
3 s1 <NA> D
4 s1 DPY19L2 DPY19L2P2
5 s1 AC084357.2 S100A11P1
6 s1 AC027667.1 <NA>
If the NA
s should be replaced by the previous non-NA, add fill
at the end
...
%>% fill(ends_with("_GENE"))
# A tibble: 6 × 3
pid LEFT_GENE RIGHT_GENE
<chr> <chr> <chr>
1 s1 PTPRO <NA>
2 s1 EPS8 FOx
3 s1 EPS8 D
4 s1 DPY19L2 DPY19L2P2
5 s1 AC084357.2 S100A11P1
6 s1 AC027667.1 S100A11P1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论