2023年2月14日 02:04:34go评论80阅读模式

英文:

Error while splitting into new row with comma as delimiter

问题

我有以下的数据框：

temp = structure(list(pid = c("s1", "s1", "s1"), LEFT_GENE = c("PTPRO", "EPS8", "DPY19L2,AC084357.2,AC027667.1"), RIGHT_GENE = c("", "FOx,D", "DPY19L2P2,S100A11P1")), row.names = c(1L, 2L, 3L), class = "data.frame")

我想要将以逗号分隔的每个项拆分为新的行，并创建新的组合。例如，最后一行应该创建6个额外的新行。然而，我遇到了一个我不理解的错误。

temp %>%
  separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  data.frame(stringsAsFactors = FALSE)

错误消息是：

Error in `fn()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_error()` to see where the error occurred.

然而，错误似乎来自第3行，因为前两行正常工作。

temp[1:2, ] %>%
  separate_rows(LEFT_GENE:RIGHT_GENE, sep=",") %>%
  data.frame(stringsAsFactors = FALSE)

有人知道问题是什么吗？

英文:

I have the following dataframe

temp = structure(list(pid = c(&quot;s1&quot;, &quot;s1&quot;, &quot;s1&quot;), LEFT_GENE = c(&quot;PTPRO&quot;, &quot;EPS8&quot;, &quot;DPY19L2,AC084357.2,AC027667.1&quot;
), RIGHT_GENE = c(&quot;&quot;, &quot;FOx,D&quot;, &quot;DPY19L2P2,S100A11P1&quot;)), row.names = c(1L, 2L, 3L), class = &quot;data.frame&quot;)
  pid                     LEFT_GENE          RIGHT_GENE
1  s1                         PTPRO                    
2  s1                          EPS8                 FOx, D
3  s1 DPY19L2,AC084357.2,AC027667.1 DPY19L2P2,S100A11P1

I want to split each item delimited with a comma into a new row and create new combination.
For example, the last row should create 6 new additional rows. However I'm getting this error I don't understand.

temp %&gt;%
  separate_rows(LEFT_GENE:RIGHT_GENE, sep=&quot;,&quot;) %&gt;%  
  data.frame ( stringsAsFactors = F)
Error in `fn()`:
! In row 3, can&#39;t recycle input of size 3 to size 2.
Run `rlang::last_error()` to see where the error occurred.

However the error seems to be coming from row 3 since rows 1:2 works fine

&gt; temp[1:2, 
+      ] %&gt;%
+   separate_rows(LEFT_GENE:RIGHT_GENE, sep=&quot;,&quot;) %&gt;%  
+   data.frame ( stringsAsFactors = F)
  pid LEFT_GENE RIGHT_GENE
1  s1     PTPRO           
2  s1      EPS8        FOx
3  s1      EPS8          D

Does anyone know what the issue is?

答案1

得分: 3

你只能一次分开一列
     temp %&gt;%
       separate_rows(RIGHT_GENE)%&gt;%
       separate_rows(LEFT_GENE)
    # A tibble: 9 &#215; 3
      pid   LEFT_GENE  RIGHT_GENE 
      &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;      
    1 s1    PTPRO      &quot;&quot;         
    2 s1    EPS8       &quot;FOx&quot;      
    3 s1    EPS8       &quot;D&quot;        
    4 s1    DPY19L2    &quot;DPY19L2P2&quot;
    5 s1    AC084357.2 &quot;DPY19L2P2&quot;
    6 s1    AC027667.1 &quot;DPY19L2P2&quot;
    7 s1    DPY19L2    &quot;S100A11P1&quot;
    8 s1    AC084357.2 &quot;S100A11P1&quot;
    9 s1    AC027667.1 &quot;S100A11P1&quot;

英文:

You can only separate one column at a time

 temp %&gt;%
   separate_rows(RIGHT_GENE)%&gt;%
   separate_rows(LEFT_GENE)
# A tibble: 9 &#215; 3
  pid   LEFT_GENE  RIGHT_GENE 
  &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;      
1 s1    PTPRO      &quot;&quot;         
2 s1    EPS8       &quot;FOx&quot;      
3 s1    EPS8       &quot;D&quot;        
4 s1    DPY19L2    &quot;DPY19L2P2&quot;
5 s1    AC084357.2 &quot;DPY19L2P2&quot;
6 s1    AC027667.1 &quot;DPY19L2P2&quot;
7 s1    DPY19L2    &quot;S100A11P1&quot;
8 s1    AC084357.2 &quot;S100A11P1&quot;
9 s1    AC027667.1 &quot;S100A11P1&quot;

答案2

得分: 1

如果我们需要6行，一个选项是

library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
temp %>%
  mutate(across(ends_with("_GENE"), ~ strsplit(.x,  split = ",")), 
  cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %>%
  mutate(across(ends_with("_GENE"),
    ~ map2(.x, cnt, ~ `length<-`(.x, .y))) %>% 
  select(-cnt) %>% 
  unnest_longer(where(is.list))

-输出

# A tibble: 6 × 3
  pid   LEFT_GENE  RIGHT_GENE
  <chr> <chr>      <chr>     
1 s1    PTPRO      <NA>      
2 s1    EPS8       FOx       
3 s1    <NA>       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 <NA>

如果NA应该被前一个非NA替代，请在末尾添加fill:

...
%>% fill(ends_with("_GENE"))
# A tibble: 6 × 3
  pid   LEFT_GENE  RIGHT_GENE
  <chr> <chr>      <chr>     
1 s1    PTPRO      <NA>      
2 s1    EPS8       FOx       
3 s1    EPS8       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 S100A11P1

英文:

If we need 6 rows, an option is

library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
temp %&gt;% 
  mutate(across(ends_with(&quot;_GENE&quot;), ~ strsplit(.x,  split = &quot;,&quot;)), 
  cnt = pmax(lengths(LEFT_GENE), lengths(RIGHT_GENE))) %&gt;% 
  mutate(across(ends_with(&quot;_GENE&quot;),
    ~ map2(.x, cnt, ~ `length&lt;-`(.x, .y)))) %&gt;%
  select(-cnt) %&gt;%
  unnest_longer(where(is.list))

-output

# A tibble: 6 &#215; 3
  pid   LEFT_GENE  RIGHT_GENE
  &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;     
1 s1    PTPRO      &lt;NA&gt;      
2 s1    EPS8       FOx       
3 s1    &lt;NA&gt;       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 &lt;NA&gt;

If the NAs should be replaced by the previous non-NA, add fill at the end

...
%&gt;% fill(ends_with(&quot;_GENE&quot;))
# A tibble: 6 &#215; 3
  pid   LEFT_GENE  RIGHT_GENE
  &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;     
1 s1    PTPRO      &lt;NA&gt;      
2 s1    EPS8       FOx       
3 s1    EPS8       D         
4 s1    DPY19L2    DPY19L2P2 
5 s1    AC084357.2 S100A11P1 
6 s1    AC027667.1 S100A11P1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

错误，使用逗号作为分隔符拆分为新行时。

问题

答案1

答案2

如何在不删除字母的情况下使用str_replace分隔字符串？

R函数用于修剪数据框。

为不兼容的类定义S3组泛型

在data.table中，查找在它们之间有其他类型事件的事件。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。