2023年5月25日 05:12:47go评论98阅读模式

英文:

Exclude non-matching strings from data frame

问题

我有两个数据框DFA和DFB，它们都有一个共同的ID列ID3。我想要找出在DFA中存在但在DFB中缺失的ID3。然后，我想要将DFA和DFB合并在一起，但只保留共同的ID3。

以下是你尝试的代码，我将其翻译为中文：

# 找出在DFA中存在但在DFB中缺失的ID3
DIFFERENT_IDS <- anti_join(DFA, DFB, by = "ID2")
my_ID2_N <- as.vector(unlist(DIFFERENT_IDS$ID2))
# 从DFA中删除不包含在my_ID2_N中的ID3
DFA1 <- DFA
DFA1 <- DFA1[!(DFA1$ID2 %in% my_ID2_N),]

这段代码看起来基本正确，但你提到可能存在ID2多次出现的情况。如果你的目标是仅保留在两个数据框中都有的ID3，你可以考虑使用semi_join而不是anti_join来查找共同的ID3，然后合并数据框。以下是更新后的代码：

# 找出在DFA和DFB中都存在的ID3
COMMON_IDS <- semi_join(DFA, DFB, by = "ID2")
# 根据共同的ID3合并DFA和DFB
result <- merge(DFA, DFB, by = "ID2")
# 选取需要的列
result <- result[, c("ID3", "Time", "Day", "Month", "Year", "First", "Last", "Age")]

这段代码会找到在DFA和DFB中都存在的ID3，然后将它们合并在一起，并只选择所需的列。

英文:

I have two data frames DFA and DFB with a common ID column ID3. I want to identify the ID3's which are present in DFA but missing from DFB. I then want to merge DFA and DFB together, but keeping only the common ID3's.

DFA:

Time	Day	Month	Year	ID3
13	12	12	23	13_12_12_23
12	9	9	91	12_9_9_91
10	2	3	80	10_2_3_80

DFB:

First	Last	Age	ID3
Tom	Smith	25	13_12_12_23
Phil	Green	41	6_11_8_45
Joe	Bloggs	71	12_9_9_91
Ben	Bolton	84	2_1_9_63
Kate	Jones	34	10_2_3_80

Intended result:

ID3	Time	Day	Month	Year	First	Last	Age
13_12_12_23	13	12	12	23	Tom	Smith	25
12_9_9_91	12	9	9	91	Joe	Bloggs	71
10_2_3_80	10	2	3	80	Kate	Jones	34

I have tried the following, which doesn't seem to be identifying the different ID2's correctly

DIFFERENT_IDS <-anti_join(DFA, DFB, by = "ID2")

my_ID2_N <- as.vector(unlist(DIFFERENT_IDS$ID2))

DFA1 <- DFA

DFA1 <-DFA1[!(DFA1$ID2 %in% my_ID2_N),]

Any suggestions of where I might be going wrong? It is probably worth noting too that the actual data frames I am merging contain different numbers of rows within each of the ID2 (so we have ID2's listed multiple times for each trial sometimes, depending on the data points being reported, so perhaps this is causing the issue?)

答案1

得分: 1

你下次应该提供一个可重现的代码...
如果这个方法奏效，请告诉我。我使用左连接来只保留与DFB中匹配的ID。
在R中，使用library(data.pasta)可以获得最强大的插件（esquisse也不错）。

DFA <- data.frame(
  stringsAsFactors = FALSE,
              Time = c(13L, 12L, 10L),
               Day = c(12L, 9L, 2L),
             Month = c(12L, 9L, 3L),
              Year = c(23L, 91L, 80L),
               ID3 = c("13_12_12_23", "12_9_9_91", "10_2_3_80")
       )
DFB <- data.frame(
  stringsAsFactors = FALSE,
             First = c("Tom", "Phil", "Joe", "Ben", "Kate"),
              Last = c("Smith", "Green", "Bloggs", "Bolton", "Jones"),
               Age = c(25L, 41L, 71L, 84L, 34L),
                      ID3 = c("13_12_12_23",
                              "6_11_8_45","12_9_9_91","2_1_9_63","10_2_3_80")
       )
left_join(DFA, DFB) %>%
  relocate(ID3, .before = everything())

输出：

              ID3 Time Day Month Year First   Last Age
1 13_12_12_23   13  12    12   23   Tom  Smith  25
2   12_9_9_91   12   9     9   91   Joe Bloggs  71
3   10_2_3_80   10   2     3   80  Kate  Jones  34

英文:

You should really provide a reproducible code next time...
Let me know if that did the trick. I used left join to keep only the matching ID's from from DFB
use library(data.pasta) to have the greatest addin possible in R (esquisse is also nice)

DFA &lt;- data.frame(
  stringsAsFactors = FALSE,
              Time = c(13L, 12L, 10L),
               Day = c(12L, 9L, 2L),
             Month = c(12L, 9L, 3L),
              Year = c(23L, 91L, 80L),
               ID3 = c(&quot;13_12_12_23&quot;, &quot;12_9_9_91&quot;, &quot;10_2_3_80&quot;)
       )
DFB &lt;- data.frame(
  stringsAsFactors = FALSE,
             First = c(&quot;Tom&quot;, &quot;Phil&quot;, &quot;Joe&quot;, &quot;Ben&quot;, &quot;Kate&quot;),
              Last = c(&quot;Smith&quot;, &quot;Green&quot;, &quot;Bloggs&quot;, &quot;Bolton&quot;, &quot;Jones&quot;),
               Age = c(25L, 41L, 71L, 84L, 34L),
                      ID3 = c(&quot;13_12_12_23&quot;,
                              &quot;6_11_8_45&quot;,&quot;12_9_9_91&quot;,&quot;2_1_9_63&quot;,&quot;10_2_3_80&quot;)
       )
left_join(DFA,DFB) %&gt;% 
  relocate(ID3, .before = everything())
          ID3 Time Day Month Year First   Last Age
1 13_12_12_23   13  12    12   23   Tom  Smith  25
2   12_9_9_91   12   9     9   91   Joe Bloggs  71
3   10_2_3_80   10   2     3   80  Kate  Jones  34

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从数据框中排除不匹配的字符串。

问题

答案1

按客户和下次交易时间分组客户会话

如何在pandas数据帧中合并两行并将其索引保存在新列中

Heatmaply 在 R Viewer 或 Markdown 中未显示，但在按下 “在新窗口中显示” 时加载。

Object of type 'closure' is not subsettable when training a geographically weighted random forest model

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。