在R中,subset与left_join用于筛选数据框。

huangapple go评论79阅读模式
英文:

Subset vs left_join in R to filter a dataframe

问题

以下是已翻译的部分:

Method:1

Male <- ph_table[(ph_table$Sex == "M"), ]
MaleID <- unique(Male$ID)
MaleID <- as.data.frame(MaleID)
colnames(MaleID)[1] <- "Tumor_Sample_Barcode"
Male_maf <- left_join(MaleID, Big_data, by="Tumor_Sample_Barcode")
dim(Male_maf)
[1] 1983 133

Method_2

colnames(ph_table)[which(names(ph_table) == "ID")] <- "Tumor_Sample_Barcode"
Male_samples <- unique(subset(ph_table, Sex == "M")$Tumor_Sample_Barcode)
M_maf <- subset(Big_data, Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885 133

Does it make any sense?

英文:

I have a general query. I am using two different ways to subset my dataframe in R. And to my surprise, i am getting different number of samples by both ways although it shouldnot be like this as the input data is same. CAN some one please explain me the logic behind this.

Method:1

Male&lt;- ph_table[(ph_table$Sex == &quot;M&quot;), ]
MaleID&lt;-unique(Male$ID)
MaleID&lt;-as.data.frame(MaleID)
colnames(MaleID)[1] &lt;- &quot;Tumor_Sample_Barcode&quot;
Male_maf&lt;-left_join(MaleID,Big_data, by=&quot;Tumor_Sample_Barcode&quot;)
dim(Male_maf)
[1] 1983  133

Method_2

colnames(ph_table)[which(names(ph_table) == &quot;ID&quot;)]&lt;- &quot;Tumor_Sample_Barcode&quot;
Male_samples&lt;-unique(subset(ph_table,Sex==&quot;M&quot;)$Tumor_Sample_Barcode)
M_maf&lt;-subset(Big_data,Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885  133

Does it make any sense?

答案1

得分: 1

left_join(MaleID,Big_data, by=&quot;Tumor_Sample_Barcode&quot;) 保留了MaleID的所有行,无论它们是否在Big_data中有匹配项。

另一方面,subset(Big_data,Tumor_Sample_Barcode %in% Male_samples) 只会保留在两者中都存在的Tumor_Sample_Barcode值。

如果您使用inner_join而不是left_join,它们将是等效的。

英文:

Your left_join(MaleID,Big_data, by=&quot;Tumor_Sample_Barcode&quot;) keep all rows of MaleID whether or not they have matches in Big_data.

subset(Big_data,Tumor_Sample_Barcode %in% Male_samples) on the other hand will only keep Tumor_Sample_Barcode values that occur in both.

If you use inner_join instead of left_join they will be equivalent.

huangapple
  • 本文由 发表于 2023年3月9日 23:57:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定