在数据框中筛选符合多个条件的行。

huangapple go评论81阅读模式
英文:

Subset the rows in a dataframe that match multiple conditions

问题

I understand your request. Here is the translated code snippet without the code part:

我明白你的请求。以下是翻译好的代码部分:

我有一个类似下面的数据框:

dput(trans_eqtl[1:3,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000101162", 
"ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426, 
57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"), 
`SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280, 
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333, 
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407), 
De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA, 
3L), class = "data.frame")

我试图仅保留那些满足以下条件的行:

我想基于SNP的位置筛选SNP:如果SNP的位置大于De_cismb或小于Ds_cismb,则考虑它并添加到表trans_snp中。

我尝试了以下代码,但它没有给我正确的子集:

检查trans_Snp

trans_snp <- NULL
for(i in 1:dim(trans_eqtl)[1]){
  if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
    x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
    y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
    value <- trans_eqtl[x,]
    value <- trans_eqtl[y,]
  }
  trans_snp <- rbind(trans_snp,value)
}

这是我得到的输出数据框:

dput(trans_snp[1:4,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000132819", 
"ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20, 
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426), 
RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"), 
`SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072, 
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407), 
end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407, 
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333, 
58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")

它只填充了输入数据框的第一个值。有人知道我哪里错了吗?

英文:

I have a dataframe like below:

dput(trans_eqtl[1:3,1:10])
structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000101162&quot;, 
&quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426, 
57598009, 55975426), RsId = c(&quot;rs6084653&quot;, &quot;rs156356&quot;, &quot;rs1741314&quot;
), `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280, 
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333, 
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407
), De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA, 
3L), class = &quot;data.frame&quot;)

I am trying to keep those rows only for whose columns match the following condition:

I want to filter snps based on its position: if SNP position is more than De_cismb or less than Ds_cismb consider it and add to the table trans_snp.

I tried this code but it doesn't give me the right subset:

##check for trans_Snp

trans_snp &lt;- NULL
for(i in 1:dim(trans_eqtl)[1]){
  if((trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])==TRUE){
    x &lt;- which(trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])
    y &lt;- which(trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])
    value &lt;- trans_eqtl[x,]
    value &lt;- trans_eqtl[y,]
  


  }

  trans_snp &lt;- rbind(trans_snp,value)
}

This is the output dataframe that I am getting:

dput(trans_snp[1:4,1:10])
structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;, 
&quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20, 
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426
), RsId = c(&quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;
), `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072, 
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407
), end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407, 
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333, 
58409333, 58409333)), row.names = c(NA, 4L), class = &quot;data.frame&quot;)

Its only filled with the first value of the input dataframe.
Does anyone know where I am making the mistake.

答案1

得分: 2

If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.

i <- trans_eqtl$`SNP-Pos` > trans_eqtl$De_cismb
j <- trans_eqtl$`SNP-Pos` < trans_eqtl$Ds_cismb
trans_snp <- trans_eqtl[i | j, ]

Or, equivalently,

trans_snp <- trans_eqtl[which(i | j), ]
英文:

If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.

i &lt;- trans_eqtl$`SNP-Pos` &gt; trans_eqtl$De_cismb
j &lt;- trans_eqtl$`SNP-Pos` &lt; trans_eqtl$Ds_cismb
trans_snp &lt;- trans_eqtl[i | j, ]

Or, equivalently,

trans_snp &lt;- trans_eqtl[which(i | j), ]

答案2

得分: 2

In dplyr:

library(dplyr)

trans_eqtl %>%
  filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp
英文:

In dplyr:

library(dplyr)

trans_eqtl %&gt;%
  filter(`SNP-Pos` &gt; De_cismb | `SNP-Pos` &lt; Ds_cismb) -&gt; trans_snp

huangapple
  • 本文由 发表于 2023年5月20日 21:17:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定