2023年6月13日 04:35:15go评论97阅读模式

英文:

How to replace an entire row between two rows based on a column

问题

我理解你想要的翻译是代码部分，以下是你提供的R代码的翻译：

# 导入必要的库
library(tidyverse)
# 使用 group_by 和 mutate 进行替换操作
test_replace <- test_df %>%
  group_by(gene_id) %>%
  mutate(start = replace(start, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), start[exon_idnetity == "Event"]),
         end = replace(end, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), end[exon_idnetity == "Event"]),
         exon_idnetity = replace(exon_idnetity, row_number() > which(exon_idnetity == "Upstream") & row_number() < which(exon_idnetity == "Downstream"), "Event")
         )

希望这可以帮助你完成所需的操作。如果有任何其他问题，请随时提问。

英文:

I am dealing a with a very large mRNA splicing dataset. Here is a toy dataset to exemplify the problem:

test_df &lt;- data.frame(
  start = c(2, 9, 13, 19, 13, 20, 25, 35, 39),
  end = c(8, 12, 18, 24, 16, 24, 30, 38, 45),
  gene_id = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;),
  exon_identity = c(NA, &quot;Upstream&quot;, NA, &quot;Downstream&quot;, &quot;Event&quot;, NA, &quot;Upstream&quot;, &quot;Downstream&quot;, NA)
)
&gt; test_df
  start end gene_id exon_identity
1     2   8       A          &lt;NA&gt;
2     9  12       A      Upstream
3    13  18       A          &lt;NA&gt;
4    19  24       A    Downstream
5    13  16       A         Event
6    20  24       B          &lt;NA&gt;
7    25  30       B      Upstream
8    35  38       B    Downstream
9    39  45       B          &lt;NA&gt;

For every unique value in gene_id column, I would like to replace an entire row if it is present between "Upstream" and "Downstream" values in the exon_identity column i.e. replace row 3 with row 5. What makes it difficult for me is that there are certain genes in the gene_id column which do not have a row that needs to be replaced, e.g. "B" in the gene_id column.

This question goes in the direction of previously asked questions here and here.

Based on those and other resources, I have tried:

library(tidyverse)
test_replace &lt;- test_df %&gt;% 
  group_by(gene_id) %&gt;% 
  mutate(start = replace(start, row_number() &gt; which(exon_idnetity == &quot;Upstream&quot;) &amp; row_number() &lt; which(exon_idnetity == &quot;Downstream&quot;), start[exon_idnetity == &quot;Event&quot;]),
         end = replace(end, row_number() &gt; which(exon_idnetity == &quot;Upstream&quot;) &amp; row_number() &lt; which(exon_idnetity == &quot;Downstream&quot;), end[exon_idnetity == &quot;Event&quot;]),
         exon_idnetity = replace(exon_idnetity, row_number() &gt; which(exon_idnetity == &quot;Upstream&quot;) &amp; row_number() &lt; which(exon_idnetity == &quot;Downstream&quot;), &quot;Event&quot;)
         )
Warning message:
There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `start = replace(...)`.
ℹ In group 1: `gene_id = &quot;A&quot;`.
Caused by warning in `x[list] &lt;- values`:
! number of items to replace is not a multiple of replacement length
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 
&gt; 
&gt; test_replace
# A tibble: 9 &#215; 4
# Groups:   gene_id [2]
  start   end gene_id exon_idnetity
  &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;        
1     2     8 A       NA           
2     9    12 A       Upstream     
3    NA    NA A       Event        
4    19    24 A       Downstream   
5    13    16 A       Event        
6    20    24 B       NA           
7    25    30 B       Upstream     
8    35    38 B       Downstream   
9    39    45 B       NA

Desired output:


&gt; desired_outcome 
  start end gene_id exon_idnetity
1     2   8       A          &lt;NA&gt;
2     9  12       A      Upstream
3    13  16       A         Event
4    19  24       A    Downstream
5    20  24       B          &lt;NA&gt;
6    25  30       B      Upstream
7    35  38       B    Downstream
8    39  45       B          &lt;NA&gt;

A solution, preferably using tidyverse package would be highly appreciated.

Thank you!

答案1

得分: 2

在玩具示例中，重新排序你的数据集几乎可以满足你的要求。这在真实数据集中是否有效呢？例如。

library(tidyverse)
test_df |&gt;
  mutate(
    sandwich = lag(exon_identity == &#39;Upstream&#39;) &amp; lead(exon_identity == &#39;Downstream&#39;)
  ) |&gt;
  replace_na(list(sandwich = FALSE)) |&gt;
  group_by(gene_id) |&gt;
  arrange(start) |&gt;
  ungroup() |&gt;
  filter(!sandwich) |&gt;
  select(-sandwich)

（在玩具示例中，group_by 和 ungroup 不需要。我添加它们以防在真实数据集中需要/有用。）

英文:

In the toy example, reordering your data set gives you almost all of what you want. Will that work in the real data set? E.g.

library(tidyverse)
test_df |&gt;
  mutate(
    sandwich = lag(exon_identity == &#39;Upstream&#39;) &amp; lead(exon_identity == &#39;Downstream&#39;)
  ) |&gt;
  replace_na(list(sandwich = FALSE)) |&gt;
  group_by(gene_id) |&gt;
  arrange(start) |&gt;
  ungroup() |&gt;
  filter(!sandwich) |&gt;
  select(-sandwich)

(In the toy example, group_by and ungroup are not needed. I added them in case it was needed/useful in the real data set.)

答案2

得分: 0

如果@MelissaKey关于您实际数据的结构是正确的，他们的解决方案将非常有效。否则，以下是一个执行此任务的函数，以及group_modify()：

library(dplyr)
library(tidyr)
replace_rows <- function(x, ...) {
  is_bad <- replace_na(
    lag(x$exon_identity) == "Upstream" & lead(x$exon_identity) == "Downstream",
    FALSE
  )
  if (any(is_bad)) {
    is_event <- replace_na(x$exon_identity == "Event", FALSE)
    x <- x %>%
      filter(!is_bad, !is_event) %>%
      add_row(
        filter(x, is_event),
        .before = which(is_bad)
      )
  }
  x
}
test_df %>%
  group_by(gene_id) %>%
  group_modify(replace_rows) %>%
  ungroup()

# A tibble: 8 × 4
  gene_id start   end exon_identity
1 A           2     8 <NA>         
2 A           9    12 Upstream     
3 A          13    16 Event        
4 A          19    24 Downstream   
5 B          20    24 <NA>         
6 B          25    30 Upstream     
7 B          35    38 Downstream   
8 B          39    45 <NA>

英文:

If @MelissaKey is right about the structure of your actual data, their solution will work nicely. Otherwise, here’s a function that does the job along with group_modify():

library(dplyr)
library(tidyr)
replace_rows &lt;- function(x, ...) {
  is_bad &lt;- replace_na(
    lag(x$exon_identity) == &quot;Upstream&quot; &amp; lead(x$exon_identity) == &quot;Downstream&quot;,
	FALSE
  )
  if (any(is_bad)) {
    is_event &lt;- replace_na(x$exon_identity == &quot;Event&quot;, FALSE)
    x &lt;- x %&gt;%
      filter(!is_bad, !is_event) %&gt;%
      add_row(
        filter(x, is_event),
        .before = which(is_bad)
      )
  }
  x
}
test_df %&gt;% 
  group_by(gene_id) %&gt;% 
  group_modify(replace_rows) %&gt;%
  ungroup()

# A tibble: 8 &#215; 4
  gene_id start   end exon_identity
  &lt;chr&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;        
1 A           2     8 &lt;NA&gt;         
2 A           9    12 Upstream     
3 A          13    16 Event        
4 A          19    24 Downstream   
5 B          20    24 &lt;NA&gt;         
6 B          25    30 Upstream     
7 B          35    38 Downstream   
8 B          39    45 &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

替换基于列的两行之间的整行数据。

问题

答案1

答案2

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

更改绘制谱系图中的标签大小。

如何粘贴两个带引号的字符串？

长时间在调试模式下挂起，当使用`Map`时遇到错误。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。