从数据框中排除不匹配的字符串。

huangapple go评论68阅读模式
英文:

Exclude non-matching strings from data frame

问题

我有两个数据框DFADFB,它们都有一个共同的ID列ID3。我想要找出在DFA中存在但在DFB中缺失的ID3。然后,我想要将DFADFB合并在一起,但只保留共同的ID3。

以下是你尝试的代码,我将其翻译为中文:

# 找出在DFA中存在但在DFB中缺失的ID3
DIFFERENT_IDS <- anti_join(DFA, DFB, by = "ID2")
my_ID2_N <- as.vector(unlist(DIFFERENT_IDS$ID2))

# 从DFA中删除不包含在my_ID2_N中的ID3
DFA1 <- DFA
DFA1 <- DFA1[!(DFA1$ID2 %in% my_ID2_N),]

这段代码看起来基本正确,但你提到可能存在ID2多次出现的情况。如果你的目标是仅保留在两个数据框中都有的ID3,你可以考虑使用semi_join而不是anti_join来查找共同的ID3,然后合并数据框。以下是更新后的代码:

# 找出在DFA和DFB中都存在的ID3
COMMON_IDS <- semi_join(DFA, DFB, by = "ID2")

# 根据共同的ID3合并DFA和DFB
result <- merge(DFA, DFB, by = "ID2")

# 选取需要的列
result <- result[, c("ID3", "Time", "Day", "Month", "Year", "First", "Last", "Age")]

这段代码会找到在DFADFB中都存在的ID3,然后将它们合并在一起,并只选择所需的列。

英文:

I have two data frames DFA and DFB with a common ID column ID3. I want to identify the ID3's which are present in DFA but missing from DFB. I then want to merge DFA and DFB together, but keeping only the common ID3's.

DFA:

Time Day Month Year ID3
13 12 12 23 13_12_12_23
12 9 9 91 12_9_9_91
10 2 3 80 10_2_3_80

DFB:

First Last Age ID3
Tom Smith 25 13_12_12_23
Phil Green 41 6_11_8_45
Joe Bloggs 71 12_9_9_91
Ben Bolton 84 2_1_9_63
Kate Jones 34 10_2_3_80

Intended result:

ID3 Time Day Month Year First Last Age
13_12_12_23 13 12 12 23 Tom Smith 25
12_9_9_91 12 9 9 91 Joe Bloggs 71
10_2_3_80 10 2 3 80 Kate Jones 34

I have tried the following, which doesn't seem to be identifying the different ID2's correctly

DIFFERENT_IDS &lt;-anti_join(DFA, DFB, by = &quot;ID2&quot;)

my_ID2_N &lt;- as.vector(unlist(DIFFERENT_IDS$ID2))

DFA1 &lt;- DFA

DFA1 &lt;-DFA1[!(DFA1$ID2 %in% my_ID2_N),]

Any suggestions of where I might be going wrong? It is probably worth noting too that the actual data frames I am merging contain different numbers of rows within each of the ID2 (so we have ID2's listed multiple times for each trial sometimes, depending on the data points being reported, so perhaps this is causing the issue?)

答案1

得分: 1

你下次应该提供一个可重现的代码...
如果这个方法奏效,请告诉我。我使用左连接来只保留与DFB中匹配的ID。
在R中,使用library(data.pasta)可以获得最强大的插件(esquisse也不错)。

DFA <- data.frame(
  stringsAsFactors = FALSE,
              Time = c(13L, 12L, 10L),
               Day = c(12L, 9L, 2L),
             Month = c(12L, 9L, 3L),
              Year = c(23L, 91L, 80L),
               ID3 = c("13_12_12_23", "12_9_9_91", "10_2_3_80")
       )

DFB <- data.frame(
  stringsAsFactors = FALSE,
             First = c("Tom", "Phil", "Joe", "Ben", "Kate"),
              Last = c("Smith", "Green", "Bloggs", "Bolton", "Jones"),
               Age = c(25L, 41L, 71L, 84L, 34L),
                      ID3 = c("13_12_12_23",
                              "6_11_8_45","12_9_9_91","2_1_9_63","10_2_3_80")
       )

left_join(DFA, DFB) %>%
  relocate(ID3, .before = everything())

输出:

              ID3 Time Day Month Year First   Last Age
1 13_12_12_23   13  12    12   23   Tom  Smith  25
2   12_9_9_91   12   9     9   91   Joe Bloggs  71
3   10_2_3_80   10   2     3   80  Kate  Jones  34
英文:

You should really provide a reproducible code next time...
Let me know if that did the trick. I used left join to keep only the matching ID's from from DFB
use library(data.pasta) to have the greatest addin possible in R (esquisse is also nice)

DFA &lt;- data.frame(
  stringsAsFactors = FALSE,
              Time = c(13L, 12L, 10L),
               Day = c(12L, 9L, 2L),
             Month = c(12L, 9L, 3L),
              Year = c(23L, 91L, 80L),
               ID3 = c(&quot;13_12_12_23&quot;, &quot;12_9_9_91&quot;, &quot;10_2_3_80&quot;)
       )

DFB &lt;- data.frame(
  stringsAsFactors = FALSE,
             First = c(&quot;Tom&quot;, &quot;Phil&quot;, &quot;Joe&quot;, &quot;Ben&quot;, &quot;Kate&quot;),
              Last = c(&quot;Smith&quot;, &quot;Green&quot;, &quot;Bloggs&quot;, &quot;Bolton&quot;, &quot;Jones&quot;),
               Age = c(25L, 41L, 71L, 84L, 34L),
                      ID3 = c(&quot;13_12_12_23&quot;,
                              &quot;6_11_8_45&quot;,&quot;12_9_9_91&quot;,&quot;2_1_9_63&quot;,&quot;10_2_3_80&quot;)
       )

left_join(DFA,DFB) %&gt;% 
  relocate(ID3, .before = everything())

          ID3 Time Day Month Year First   Last Age
1 13_12_12_23   13  12    12   23   Tom  Smith  25
2   12_9_9_91   12   9     9   91   Joe Bloggs  71
3   10_2_3_80   10   2     3   80  Kate  Jones  34

huangapple
  • 本文由 发表于 2023年5月25日 05:12:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327434.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定