2023年3月9日 23:57:20go评论107阅读模式

英文:

Subset vs left_join in R to filter a dataframe

问题

以下是已翻译的部分：

Method:1

Male <- ph_table[(ph_table$Sex == "M"), ]
MaleID <- unique(Male$ID)
MaleID <- as.data.frame(MaleID)
colnames(MaleID)[1] <- "Tumor_Sample_Barcode"
Male_maf <- left_join(MaleID, Big_data, by="Tumor_Sample_Barcode")
dim(Male_maf)
[1] 1983 133

Method_2

colnames(ph_table)[which(names(ph_table) == "ID")] <- "Tumor_Sample_Barcode"
Male_samples <- unique(subset(ph_table, Sex == "M")$Tumor_Sample_Barcode)
M_maf <- subset(Big_data, Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885 133

Does it make any sense?

英文:

I have a general query. I am using two different ways to subset my dataframe in R. And to my surprise, i am getting different number of samples by both ways although it shouldnot be like this as the input data is same. CAN some one please explain me the logic behind this.

Method:1

Male&lt;- ph_table[(ph_table$Sex == &quot;M&quot;), ]
MaleID&lt;-unique(Male$ID)
MaleID&lt;-as.data.frame(MaleID)
colnames(MaleID)[1] &lt;- &quot;Tumor_Sample_Barcode&quot;
Male_maf&lt;-left_join(MaleID,Big_data, by=&quot;Tumor_Sample_Barcode&quot;)
dim(Male_maf)
[1] 1983  133

Method_2

colnames(ph_table)[which(names(ph_table) == &quot;ID&quot;)]&lt;- &quot;Tumor_Sample_Barcode&quot;
Male_samples&lt;-unique(subset(ph_table,Sex==&quot;M&quot;)$Tumor_Sample_Barcode)
M_maf&lt;-subset(Big_data,Tumor_Sample_Barcode %in% Male_samples)
dim(M_maf)
[1] 1885  133

Does it make any sense?

答案1

得分: 1

left_join(MaleID,Big_data, by="Tumor_Sample_Barcode") 保留了MaleID的所有行，无论它们是否在Big_data中有匹配项。

另一方面，subset(Big_data,Tumor_Sample_Barcode %in% Male_samples) 只会保留在两者中都存在的Tumor_Sample_Barcode值。

如果您使用inner_join而不是left_join，它们将是等效的。

英文:

Your left_join(MaleID,Big_data, by="Tumor_Sample_Barcode") keep all rows of MaleID whether or not they have matches in Big_data.

subset(Big_data,Tumor_Sample_Barcode %in% Male_samples) on the other hand will only keep Tumor_Sample_Barcode values that occur in both.

If you use inner_join instead of left_join they will be equivalent.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中，subset与left_join用于筛选数据框。

问题

答案1

在R中，将两个向量之间的函数映射到数据框中可以这样实现：

如何在R中汇总多个列并去除NAs

在`data.table`中合并数值

测量两个具有不均匀数据点的累积分布函数之间的水平距离。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。