英文:
Subset the rows in a dataframe that match multiple conditions
问题
I understand your request. Here is the translated code snippet without the code part:
我明白你的请求。以下是翻译好的代码部分:
我有一个类似下面的数据框:
dput(trans_eqtl[1:3,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000101162",
"ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426,
57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"),
`SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280,
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333,
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407),
De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA,
3L), class = "data.frame")
我试图仅保留那些满足以下条件的行:
我想基于SNP的位置筛选SNP:如果SNP的位置大于De_cismb或小于Ds_cismb,则考虑它并添加到表trans_snp中。
我尝试了以下代码,但它没有给我正确的子集:
检查trans_Snp
trans_snp <- NULL
for(i in 1:dim(trans_eqtl)[1]){
if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
value <- trans_eqtl[x,]
value <- trans_eqtl[y,]
}
trans_snp <- rbind(trans_snp,value)
}
这是我得到的输出数据框:
dput(trans_snp[1:4,1:10])
结构(list(Gene = c("ENSG00000132819", "ENSG00000132819",
"ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20,
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426),
RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"),
`SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072,
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407),
end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407,
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333,
58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")
它只填充了输入数据框的第一个值。有人知道我哪里错了吗?
英文:
I have a dataframe like below:
dput(trans_eqtl[1:3,1:10])
structure(list(Gene = c("ENSG00000132819", "ENSG00000101162",
"ENSG00000132819"), `Gene-Chr` = c(20, 20, 20), `Gene-Pos` = c(55975426,
57598009, 55975426), RsId = c("rs6084653", "rs156356", "rs1741314"
), `SNP-Chr` = c(20, 20, 20), `SNP-Pos` = c(4157072, 1819280,
4155193), start = c(57391407, 59019254, 57391407), end = c(57409333,
59025466, 57409333), Ds_cismb = c(56391407, 58019254, 56391407
), De_cismb = c(58409333, 60025466, 58409333)), row.names = c(NA,
3L), class = "data.frame")
I am trying to keep those rows only for whose columns match the following condition:
I want to filter snps based on its position: if SNP position is more than De_cismb or less than Ds_cismb consider it and add to the table trans_snp.
I tried this code but it doesn't give me the right subset:
##check for trans_Snp
trans_snp <- NULL
for(i in 1:dim(trans_eqtl)[1]){
if((trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])==TRUE | (trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])==TRUE){
x <- which(trans_eqtl$`SNP-Pos`[i] > trans_eqtl$De_cismb[i])
y <- which(trans_eqtl$`SNP-Pos`[i] < trans_eqtl$Ds_cismb[i])
value <- trans_eqtl[x,]
value <- trans_eqtl[y,]
}
trans_snp <- rbind(trans_snp,value)
}
This is the output dataframe that I am getting:
dput(trans_snp[1:4,1:10])
structure(list(Gene = c("ENSG00000132819", "ENSG00000132819",
"ENSG00000132819", "ENSG00000132819"), `Gene-Chr` = c(20, 20,
20, 20), `Gene-Pos` = c(55975426, 55975426, 55975426, 55975426
), RsId = c("rs6084653", "rs6084653", "rs6084653", "rs6084653"
), `SNP-Chr` = c(20, 20, 20, 20), `SNP-Pos` = c(4157072, 4157072,
4157072, 4157072), start = c(57391407, 57391407, 57391407, 57391407
), end = c(57409333, 57409333, 57409333, 57409333), Ds_cismb = c(56391407,
56391407, 56391407, 56391407), De_cismb = c(58409333, 58409333,
58409333, 58409333)), row.names = c(NA, 4L), class = "data.frame")
Its only filled with the first value of the input dataframe.
Does anyone know where I am making the mistake.
答案1
得分: 2
If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.
i <- trans_eqtl$`SNP-Pos` > trans_eqtl$De_cismb
j <- trans_eqtl$`SNP-Pos` < trans_eqtl$Ds_cismb
trans_snp <- trans_eqtl[i | j, ]
Or, equivalently,
trans_snp <- trans_eqtl[which(i | j), ]
英文:
If I understand correctly, there is no need for a loop. R is vectorized and vectorized comparisons will give you logical index vectors. Combine those vectors with the logical condition you want and extract those rows from the original data set.
i <- trans_eqtl$`SNP-Pos` > trans_eqtl$De_cismb
j <- trans_eqtl$`SNP-Pos` < trans_eqtl$Ds_cismb
trans_snp <- trans_eqtl[i | j, ]
Or, equivalently,
trans_snp <- trans_eqtl[which(i | j), ]
答案2
得分: 2
In dplyr
:
library(dplyr)
trans_eqtl %>%
filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp
英文:
In dplyr
:
library(dplyr)
trans_eqtl %>%
filter(`SNP-Pos` > De_cismb | `SNP-Pos` < Ds_cismb) -> trans_snp
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论