英文:
Subset vs left_join in R to filter a dataframe
问题
以下是已翻译的部分:
Method:1
Male <- ph_table[(ph_table$Sex == "M"), ]
MaleID <- unique(Male$ID)
MaleID <- as.data.frame(MaleID)
colnames(MaleID)[1] <- "Tumor_Sample_Barcode"
Male_maf <- left_join(MaleID, Big_data, by="Tumor_Sample_Barcode")
dim(Male_maf)
[1] 1983 133
Method_2
colnames(ph_table)[which(names(ph_table) == "ID")] <- "Tumor_Sample_Barcode"
Male_samples <- unique(subset(ph_table, Sex == "M")$Tumor_Sample_Barcode)
M_maf <- subset(Big_data, Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885 133
Does it make any sense?
英文:
I have a general query. I am using two different ways to subset my dataframe in R. And to my surprise, i am getting different number of samples by both ways although it shouldnot be like this as the input data is same. CAN some one please explain me the logic behind this.
Method:1
Male<- ph_table[(ph_table$Sex == "M"), ]
MaleID<-unique(Male$ID)
MaleID<-as.data.frame(MaleID)
colnames(MaleID)[1] <- "Tumor_Sample_Barcode"
Male_maf<-left_join(MaleID,Big_data, by="Tumor_Sample_Barcode")
dim(Male_maf)
[1] 1983 133
Method_2
colnames(ph_table)[which(names(ph_table) == "ID")]<- "Tumor_Sample_Barcode"
Male_samples<-unique(subset(ph_table,Sex=="M")$Tumor_Sample_Barcode)
M_maf<-subset(Big_data,Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885 133
Does it make any sense?
答案1
得分: 1
left_join(MaleID,Big_data, by="Tumor_Sample_Barcode")
保留了MaleID
的所有行,无论它们是否在Big_data
中有匹配项。
另一方面,subset(Big_data,Tumor_Sample_Barcode %in% Male_samples)
只会保留在两者中都存在的Tumor_Sample_Barcode
值。
如果您使用inner_join
而不是left_join
,它们将是等效的。
英文:
Your left_join(MaleID,Big_data, by="Tumor_Sample_Barcode")
keep all rows of MaleID
whether or not they have matches in Big_data
.
subset(Big_data,Tumor_Sample_Barcode %in% Male_samples)
on the other hand will only keep Tumor_Sample_Barcode
values that occur in both.
If you use inner_join
instead of left_join
they will be equivalent.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论