错误,使用逗号作为分隔符拆分为新行时。

huangapple go评论80阅读模式
英文:

Error while splitting into new row with comma as delimiter

问题

我有以下的数据框:

  1. temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")

我想要将以逗号分隔的每个项拆分为新的行,并创建新的组合。例如,最后一行应该创建6个额外的新行。然而,我遇到了一个我不理解的错误。

  1. temp %>%
  2. separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  3. data.frame(stringsAsFactors = FALSE)

错误消息是:

  1. Error in `fn()`:
  2. ! In row 3, can't recycle input of size 3 to size 2.
  3. Run `rlang::last_error()` to see where the error occurred.

然而,错误似乎来自第3行,因为前两行正常工作。

  1. temp[1:2, ] %>%
  2. separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  3. data.frame(stringsAsFactors = FALSE)

有人知道问题是什么吗?

英文:

I have the following dataframe

  1. temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"
  2. ), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")
  3. pid LEFT_GENE RIGHT_GENE
  4. 1 s1 PTPRO
  5. 2 s1 EPS8 FOx, D
  6. 3 s1 DPY19L2,AC084357.2,AC027667.1 DPY19L2P2,S100A11P1

I want to split each item delimited with a comma into a new row and create new combination.
For example, the last row should create 6 new additional rows. However I'm getting this error I don't understand.

  1. temp %>%
  2. separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  3. data.frame ( stringsAsFactors = F)
  4. Error in `fn()`:
  5. ! In row 3, can't recycle input of size 3 to size 2.
  6. Run `rlang::last_error()` to see where the error occurred.

However the error seems to be coming from row 3 since rows 1:2 works fine

  1. > temp[1:2,
  2. + ] %>%
  3. + separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  4. + data.frame ( stringsAsFactors = F)
  5. pid LEFT_GENE RIGHT_GENE
  6. 1 s1 PTPRO
  7. 2 s1 EPS8 FOx
  8. 3 s1 EPS8 D

Does anyone know what the issue is?

答案1

得分: 3

  1. 你只能一次分开一列
  2. temp %>%
  3. separate_rows(RIGHT_GENE)%>%
  4. separate_rows(LEFT_GENE)
  5. # A tibble: 9 × 3
  6. pid LEFT_GENE RIGHT_GENE
  7. <chr> <chr> <chr>
  8. 1 s1 PTPRO ""
  9. 2 s1 EPS8 "FOx"
  10. 3 s1 EPS8 "D"
  11. 4 s1 DPY19L2 "DPY19L2P2"
  12. 5 s1 AC084357.2 "DPY19L2P2"
  13. 6 s1 AC027667.1 "DPY19L2P2"
  14. 7 s1 DPY19L2 "S100A11P1"
  15. 8 s1 AC084357.2 "S100A11P1"
  16. 9 s1 AC027667.1 "S100A11P1"
英文:

You can only separate one column at a time

  1. temp %>%
  2. separate_rows(RIGHT_GENE)%>%
  3. separate_rows(LEFT_GENE)
  4. # A tibble: 9 × 3
  5. pid LEFT_GENE RIGHT_GENE
  6. <chr> <chr> <chr>
  7. 1 s1 PTPRO ""
  8. 2 s1 EPS8 "FOx"
  9. 3 s1 EPS8 "D"
  10. 4 s1 DPY19L2 "DPY19L2P2"
  11. 5 s1 AC084357.2 "DPY19L2P2"
  12. 6 s1 AC027667.1 "DPY19L2P2"
  13. 7 s1 DPY19L2 "S100A11P1"
  14. 8 s1 AC084357.2 "S100A11P1"
  15. 9 s1 AC027667.1 "S100A11P1"

答案2

得分: 1

如果我们需要6行,一个选项是

  1. library(dplyr)
  2. library(tidyr)
  3. library(stringr)
  4. library(purrr)
  5. temp %>%
  6. mutate(across(ends_with("_GENE"), ~ strsplit(.x, split = ",")),
  7. cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %>%
  8. mutate(across(ends_with("_GENE"),
  9. ~ map2(.x, cnt, ~ `length<-`(.x, .y))) %>%
  10. select(-cnt) %>%
  11. unnest_longer(where(is.list))

-输出

  1. # A tibble: 6 × 3
  2. pid LEFT_GENE RIGHT_GENE
  3. <chr> <chr> <chr>
  4. 1 s1 PTPRO <NA>
  5. 2 s1 EPS8 FOx
  6. 3 s1 <NA> D
  7. 4 s1 DPY19L2 DPY19L2P2
  8. 5 s1 AC084357.2 S100A11P1
  9. 6 s1 AC027667.1 <NA>

如果NA应该被前一个非NA替代,请在末尾添加fill:

  1. ...
  2. %>% fill(ends_with("_GENE"))
  3. # A tibble: 6 × 3
  4. pid LEFT_GENE RIGHT_GENE
  5. <chr> <chr> <chr>
  6. 1 s1 PTPRO <NA>
  7. 2 s1 EPS8 FOx
  8. 3 s1 EPS8 D
  9. 4 s1 DPY19L2 DPY19L2P2
  10. 5 s1 AC084357.2 S100A11P1
  11. 6 s1 AC027667.1 S100A11P1
英文:

If we need 6 rows, an option is

  1. library(dplyr)
  2. library(tidyr)
  3. library(stringr)
  4. library(purrr)
  5. temp %&gt;%
  6. mutate(across(ends_with(&quot;_GENE&quot;), ~ strsplit(.x, split = &quot;,&quot;)),
  7. cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %&gt;%
  8. mutate(across(ends_with(&quot;_GENE&quot;),
  9. ~ map2(.x, cnt, ~ `length&lt;-`(.x, .y)))) %&gt;%
  10. select(-cnt) %&gt;%
  11. unnest_longer(where(is.list))

-output

  1. # A tibble: 6 &#215; 3
  2. pid LEFT_GENE RIGHT_GENE
  3. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  4. 1 s1 PTPRO &lt;NA&gt;
  5. 2 s1 EPS8 FOx
  6. 3 s1 &lt;NA&gt; D
  7. 4 s1 DPY19L2 DPY19L2P2
  8. 5 s1 AC084357.2 S100A11P1
  9. 6 s1 AC027667.1 &lt;NA&gt;

If the NAs should be replaced by the previous non-NA, add fill at the end

  1. ...
  2. %&gt;% fill(ends_with(&quot;_GENE&quot;))
  3. # A tibble: 6 &#215; 3
  4. pid LEFT_GENE RIGHT_GENE
  5. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  6. 1 s1 PTPRO &lt;NA&gt;
  7. 2 s1 EPS8 FOx
  8. 3 s1 EPS8 D
  9. 4 s1 DPY19L2 DPY19L2P2
  10. 5 s1 AC084357.2 S100A11P1
  11. 6 s1 AC027667.1 S100A11P1

huangapple
  • 本文由 发表于 2023年2月14日 02:04:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439645.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定