在数据框中筛选符合多个条件的行。

huangapple go评论107阅读模式
英文:

Subset the rows in a dataframe that match multiple conditions

问题

I understand your request. Here is the translated code snippet without the code part:

我明白你的请求。以下是翻译好的代码部分:

我有一个类似下面的数据框:

  1. dput(trans_eqtl[1:3,1:10])
  2. 结构(list(Gene = c("ENSG00000132819", "ENSG00000101162",
  3. "ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426,
  4. 57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"),
  5. `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280,
  6. 4155193), start = c(57391407, 59019254, 57391407), end = c(57409333,
  7. 59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407),
  8. De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA,
  9. 3L), class = "data.frame")

我试图仅保留那些满足以下条件的行:

我想基于SNP的位置筛选SNP:如果SNP的位置大于De_cismb或小于Ds_cismb,则考虑它并添加到表trans_snp中。

我尝试了以下代码,但它没有给我正确的子集:

检查trans_Snp

  1. trans_snp <- NULL
  2. for(i in 1:dim(trans_eqtl)[1]){
  3. if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
  4. x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
  5. y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
  6. value <- trans_eqtl[x,]
  7. value <- trans_eqtl[y,]
  8. }
  9. trans_snp <- rbind(trans_snp,value)
  10. }

这是我得到的输出数据框:

  1. dput(trans_snp[1:4,1:10])
  2. 结构(list(Gene = c("ENSG00000132819", "ENSG00000132819",
  3. "ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20,
  4. 20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426),
  5. RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"),
  6. `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072,
  7. 4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407),
  8. end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407,
  9. 56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333,
  10. 58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")

它只填充了输入数据框的第一个值。有人知道我哪里错了吗?

英文:

I have a dataframe like below:

  1. dput(trans_eqtl[1:3,1:10])
  2. structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000101162&quot;,
  3. &quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426,
  4. 57598009, 55975426), RsId = c(&quot;rs6084653&quot;, &quot;rs156356&quot;, &quot;rs1741314&quot;
  5. ), `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280,
  6. 4155193), start = c(57391407, 59019254, 57391407), end = c(57409333,
  7. 59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407
  8. ), De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA,
  9. 3L), class = &quot;data.frame&quot;)

I am trying to keep those rows only for whose columns match the following condition:

I want to filter snps based on its position: if SNP position is more than De_cismb or less than Ds_cismb consider it and add to the table trans_snp.

I tried this code but it doesn't give me the right subset:

##check for trans_Snp

  1. trans_snp &lt;- NULL
  2. for(i in 1:dim(trans_eqtl)[1]){
  3. if((trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])==TRUE){
  4. x &lt;- which(trans_eqtl$`SNP-Pos`[i] &gt; trans_eqtl$De_cismb[i])
  5. y &lt;- which(trans_eqtl$`SNP-Pos`[i] &lt; trans_eqtl$Ds_cismb[i])
  6. value &lt;- trans_eqtl[x,]
  7. value &lt;- trans_eqtl[y,]
  8. }
  9. trans_snp &lt;- rbind(trans_snp,value)
  10. }

This is the output dataframe that I am getting:

  1. dput(trans_snp[1:4,1:10])
  2. structure(list(Gene = c(&quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;,
  3. &quot;ENSG00000132819&quot;, &quot;ENSG00000132819&quot;), `Gene-Chr` = c(20, 20,
  4. 20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426
  5. ), RsId = c(&quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;, &quot;rs6084653&quot;
  6. ), `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072,
  7. 4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407
  8. ), end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407,
  9. 56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333,
  10. 58409333, 58409333)), row.names = c(NA, 4L), class = &quot;data.frame&quot;)

Its only filled with the first value of the input dataframe.
Does anyone know where I am making the mistake.

答案1

得分: 2

If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.

  1. i <- trans_eqtl$`SNP-Pos` > trans_eqtl$De_cismb
  2. j <- trans_eqtl$`SNP-Pos` < trans_eqtl$Ds_cismb
  3. trans_snp <- trans_eqtl[i | j, ]

Or, equivalently,

  1. trans_snp <- trans_eqtl[which(i | j), ]
英文:

If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.

  1. i &lt;- trans_eqtl$`SNP-Pos` &gt; trans_eqtl$De_cismb
  2. j &lt;- trans_eqtl$`SNP-Pos` &lt; trans_eqtl$Ds_cismb
  3. trans_snp &lt;- trans_eqtl[i | j, ]

Or, equivalently,

  1. trans_snp &lt;- trans_eqtl[which(i | j), ]

答案2

得分: 2

In dplyr:

  1. library(dplyr)
  2. trans_eqtl %>%
  3. filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp
英文:

In dplyr:

  1. library(dplyr)
  2. trans_eqtl %&gt;%
  3. filter(`SNP-Pos` &gt; De_cismb | `SNP-Pos` &lt; Ds_cismb) -&gt; trans_snp

huangapple
  • 本文由 发表于 2023年5月20日 21:17:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定