2023年5月20日 21:17:47go评论107阅读模式

英文:

Subset the rows in a dataframe that match multiple conditions

问题

I understand your request. Here is the translated code snippet without the code part:

我明白你的请求。以下是翻译好的代码部分：

我有一个类似下面的数据框：

dput(trans_eqtl[1:3,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000101162", 
"ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426, 
57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"), 
`SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280, 
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333, 
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407), 
De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA, 
3L), class = "data.frame")

我试图仅保留那些满足以下条件的行：

我想基于SNP的位置筛选SNP：如果SNP的位置大于De_cismb或小于Ds_cismb，则考虑它并添加到表trans_snp中。

我尝试了以下代码，但它没有给我正确的子集：

检查trans_Snp

trans_snp <- NULL
for(i in 1:dim(trans_eqtl)[1]){
  if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
    x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
    y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
    value <- trans_eqtl[x,]
    value <- trans_eqtl[y,]
  }
  trans_snp <- rbind(trans_snp,value)
}

这是我得到的输出数据框：

dput(trans_snp[1:4,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000132819", 
"ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20, 
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426), 
RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"), 
`SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072, 
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407), 
end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407, 
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333, 
58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")

它只填充了输入数据框的第一个值。有人知道我哪里错了吗？

英文:

I have a dataframe like below:

dput(trans_eqtl[1:3,1:10])
structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000101162&quot;, 
&quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426, 
57598009, 55975426), RsId = c(&quot;rs6084653&quot;, &quot;rs156356&quot;, &quot;rs1741314&quot;
), `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280, 
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333, 
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407
), De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA, 
3L), class = &quot;data.frame&quot;)

I am trying to keep those rows only for whose columns match the following condition:

I want to filter snps based on its position: if SNP position is more than De_cismb or less than Ds_cismb consider it and add to the table trans_snp.

I tried this code but it doesn't give me the right subset:

##check for trans_Snp

trans_snp &lt;- NULL
for(i in 1:dim(trans_eqtl)[1]){
  if((trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])==TRUE){
    x &lt;- which(trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])
    y &lt;- which(trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])
    value &lt;- trans_eqtl[x,]
    value &lt;- trans_eqtl[y,]
  
  }
  trans_snp &lt;- rbind(trans_snp,value)
}

This is the output dataframe that I am getting:

dput(trans_snp[1:4,1:10])
structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;, 
&quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20, 
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426
), RsId = c(&quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;
), `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072, 
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407
), end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407, 
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333, 
58409333, 58409333)), row.names = c(NA, 4L), class = &quot;data.frame&quot;)

Its only filled with the first value of the input dataframe.
Does anyone know where I am making the mistake.

答案1

得分: 2

If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.

i <- trans_eqtl$`SNP-Pos` > trans_eqtl$De_cismb
j <- trans_eqtl$`SNP-Pos` < trans_eqtl$Ds_cismb
trans_snp <- trans_eqtl[i | j, ]

Or, equivalently,

trans_snp <- trans_eqtl[which(i | j), ]

英文:

i &lt;- trans_eqtl$`SNP-Pos` &gt; trans_eqtl$De_cismb
j &lt;- trans_eqtl$`SNP-Pos` &lt; trans_eqtl$Ds_cismb
trans_snp &lt;- trans_eqtl[i | j, ]

Or, equivalently,

trans_snp &lt;- trans_eqtl[which(i | j), ]

答案2

得分: 2

In dplyr:

library(dplyr)
trans_eqtl %>%
  filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp

英文:

In dplyr:

library(dplyr)
trans_eqtl %&gt;%
  filter(`SNP-Pos` &gt; De_cismb | `SNP-Pos` &lt; Ds_cismb) -&gt; trans_snp

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框中筛选符合多个条件的行。

问题

检查trans_Snp

答案1

答案2

用列最小差异和的值替换缺失值

如何将数学函数的xy坐标返回为任意长度的数据框？

使用pandas进行高级排序

Optimization with multiple inequality constraints

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。