2023年8月9日 03:13:42go评论120阅读模式

英文:

Creating new column after join based on presence/absence of variable in R

问题

我有两个数据框：一个包含所有患者的数据框（df1），另一个包含接受额外治疗的患者的子集（df2）。

我想要将这两个数据框合并，保留df1中的所有行，并添加一个新列"0/1"，表示患者是否接受额外治疗（0表示未接受，1表示接受）。如果df2中有一些患者不在df1中，我希望能够捕捉到这些情况！

重要的是，我要保留df1中的所有行，因为我想要比较df1和df2之间的结果。

简化的示例：

df1

id		date		
0001    2/2/22
0002	1/1/22	
0002	3/2/22	
0003	3/2/22	
0003	5/2/22	
0004	6/7/22	
0004 	9/2/22

df2

id			
0002		
0004

期望的输出：

id		date    tx
0001    2/2/22  0
0002	1/1/22	1
0002	3/2/22	1
0003	3/2/22	0
0003	5/2/22	0
0004	6/7/22	1
0004 	9/2/22	1

我知道首先要基于ID进行合并，但不确定如何根据df2中的数据的存在与否来创建新列。我找到了一个类似的问题，但我不明白他们是如何将这两个数据集合并的。

https://stackoverflow.com/questions/68528437/create-a-presence-absence-column-based-on-presence-records

英文:

I have two dfs: one with all patients (df1) and one with a subset of those patients who are getting additional treatment (df2).

I want to join the two dfs, keeping all rows in df1 and adding a new column for “0/1” if the patient is not getting the additional treatment (0) or is getting the additional treatment (1). There shouldn’t be any issues of patients in df2 who are not in df1, but if there are, I want to catch them!

It's important I keep all of the rows in df1 as I want to compare results between df1 and df2

Simplified example:

df1

id		date		
0001    2/2/22
0002	1/1/22	
0002	3/2/22	
0003	3/2/22	
0003	5/2/22	
0004	6/7/22	
0004 	9/2/22

df2

id			
0002		
0004

Desired output:

id		date    tx
0001    2/2/22  0
0002	1/1/22	1
0002	3/2/22	1
0003	3/2/22	0
0003	5/2/22	0
0004	6/7/22	1
0004 	9/2/22	1

I know that I start with a join based on ID, but not sure the most elegant way to create the new column based on absence/presence of data in df2?

df1 %&gt;% 
left_join(df2, by = &quot;id&quot;)

I found this other question which is trying to do something similar, but I don't see how they joined the two datasets.

https://stackoverflow.com/questions/68528437/create-a-presence-absence-column-based-on-presence-records

答案1

得分: 1

在这种情况下，你可以简单地在基本R中创建一个布尔向量，无需连接、合并等操作：

df$tx <- +(df$id %in% df2$id)
#    id   date tx
#1 0001 2/2/22  0
#2 0002 1/1/22  1
#3 0002 3/2/22  1
#4 0003 3/2/22  0
#5 0003 5/2/22  0
#6 0004 6/7/22  1
#7 0004 9/2/22  1

+(...) 将其转换为 1/0，只使用 df$tx <- df$id %in% df2$id 将返回 TRUE/FALSE。

数据：

df <- structure(list(id = c("0001", "0002", "0002", "0003", "0003", "0004", "0004"), date = c("2/2/22", "1/1/22", "3/2/22", "3/2/22", "5/2/22", "6/7/22", "9/2/22")), row.names = c(NA, -7L), class = "data.frame")
df2 <- structure(list(id = c("0002", "0004")), class = "data.frame", row.names = c(NA, -2L))

英文:

In this case, you can simply create a boolean vector in base R, no need to join, merge, etc:

df$tx &lt;- +(df$id %in% df2$id)
#    id   date tx
#1 0001 2/2/22  0
#2 0002 1/1/22  1
#3 0002 3/2/22  1
#4 0003 3/2/22  0
#5 0003 5/2/22  0
#6 0004 6/7/22  1
#7 0004 9/2/22  1

The +(...) makes it a 1/0, just doing df$tx <- df$id %in% df2$id will return TRUE/FALSE

Data

df &lt;- structure(list(id = c(&quot;0001&quot;, &quot;0002&quot;, &quot;0002&quot;, &quot;0003&quot;, &quot;0003&quot;, 
&quot;0004&quot;, &quot;0004&quot;), date = c(&quot;2/2/22&quot;, &quot;1/1/22&quot;, &quot;3/2/22&quot;, &quot;3/2/22&quot;, 
&quot;5/2/22&quot;, &quot;6/7/22&quot;, &quot;9/2/22&quot;)), row.names = c(NA, -7L), class = &quot;data.frame&quot;)
df2 &lt;- structure(list(id = c(&quot;0002&quot;, &quot;0004&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, 
-2L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中，根据变量的存在与否，在连接后创建新的列可以通过以下方式实现：

问题

答案1

定义rhandsontable中contextMenu的自定义项目

用移动平均值替换时间序列数据中的异常值。

打印一列的唯一值，紧跟在一个字符串句子旁边 [使用 paste()]

选择数据框中的特定数据以替代，使用行和列名称。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。