返回两个数据框之间值超出一定百分比差异的反连接。

huangapple go评论91阅读模式
英文:

Return anti-join of two data frames with values outside a certain percentage difference

问题

You can achieve this by using the dplyr package in R and writing a custom function for the percentage-based anti-join. Here's the code to perform the desired operation:

  1. library(dplyr)
  2. # Custom anti-join function with percentage difference
  3. antijoin_function <- function(tbl1, tbl2, by, pct) {
  4. tbl1 %>%
  5. anti_join(tbl2, by = by) %>%
  6. filter(if_any(starts_with("var"), ~is.numeric(.x) || is.character(.x)) |
  7. if_all(starts_with("var"), ~is.numeric(.x) || is.character(.x) || (.x %in% tbl2[[.y]] * (1 + pct) | .x %in% tbl2[[.y]] * (1 - pct))))
  8. }
  9. # Define the data frames
  10. tbl1 <- tibble(var1 = c('r1', 'r2', 'r3', 'r4', 'r5'),
  11. var2 = c('apple', 'orange', 'banana', 'strawberry', 'lime'),
  12. var3 = c(1, 2, 3, 4, 5),
  13. var4 = c('yes', 'no', 'yes', 'yes', 'no'))
  14. tbl2 <- tibble(var1 = c('r6', 'r7', 'r8', 'r9', 'r10'),
  15. var2 = c('orange', 'banana', 'apple', 'lemon', 'strawberry'),
  16. var3 = c(2, 3, 1.5, 10, 4.1),
  17. var4 = c('no', 'yes', 'yes', 'no', 'yes'))
  18. # Use the custom anti-join function
  19. result <- antijoin_function(tbl1, tbl2, by = c('var2' = 'var2', 'var3' = 'var3', 'var4' = 'var4'), pct = 0.2)
  20. result

This code defines the custom antijoin_function that performs the anti-join operation with a percentage difference for numeric columns. It filters rows based on the specified percentage difference and returns the desired result.

英文:

I would like to compare two mixed-type data frames and return the rows that are different between them--but I would like numeric values to only be returned within a certain percentage.

  1. tbl1 &lt;- tibble(var1 = c(&#39;r1&#39;, &#39;r2&#39;, &#39;r3&#39;, &#39;r4&#39;, &#39;r5&#39;),
  2. var2 = c(&#39;apple&#39;, &#39;orange&#39;, &#39;banana&#39;, &#39;strawberry&#39;, &#39;lime&#39;),
  3. var3 = c(1, 2, 3, 4, 5),
  4. var4 = c(&#39;yes&#39;, &#39;no&#39;, &#39;yes&#39;, &#39;yes&#39;, &#39;no&#39;))
  5. tbl2 &lt;- tibble(var1 = c(&#39;r6&#39;, &#39;r7&#39;, &#39;r8&#39;, &#39;r9&#39;, &#39;r10&#39;),
  6. var2 = c(&#39;orange&#39;, &#39;banana&#39;, &#39;apple&#39;, &#39;lemon&#39;, &#39;strawberry&#39;),
  7. var3 = c(2, 3, 1.5, 10, 4.1),
  8. var4 = c(&#39;no&#39;, &#39;yes&#39;, &#39;yes&#39;, &#39;no&#39;, &#39;yes&#39;))

I know there is dplyr::anti_join but that checks for exact matches. So if I was OK with numeric values that were within 20%, then the function would be something like:

  1. tbl1 %&gt;%
  2. antijoin_function(tbl2, by = c(&#39;var2&#39; = &#39;var2&#39;, &#39;var3&#39; = &#39;var3&#39;, &#39;var4&#39; = &#39;var4&#39;),
  3. pct = 0.2)

And return

var1 var2 var3 var4
r1 apple 1 yes
r5 lime 5 no

The row with strawberry would not be returned because the single difference in var3 is less than 20%.

Are there any functions or packages that do this?

答案1

得分: 1

  1. library(dplyr)
  2. 使用full_join函数将tbl1tbl2"var2"列连接,添加后缀为""".right"
  3. 然后使用filter函数,筛选满足条件abs(var3 - var3.right)/var3 > 0.2 | if_all(contains(".right"), ~ is.na(.))的行。
  4. 最后使用select函数,移除包含".right"的列。
  5. #> # A tibble: 2 × 4
  6. #> var1 var2 var3 var4
  7. #> <chr> <chr> <dbl> <chr>
  8. #> 1 r1 apple 1 yes
  9. #> 2 r5 lime 5 no

创建于2023-05-22,使用reprex v2.0.2

  1. <details>
  2. <summary>英文:</summary>
  3. ``` r
  4. library(dplyr)
  5. full_join(tbl1, tbl2, by = c(&quot;var2&quot; = &quot;var2&quot;), suffix = c(&quot;&quot;, &quot;.right&quot;)) %&gt;%
  6. filter(abs(var3 - var3.right)/var3 &gt; 0.2 | if_all(contains(&quot;.right&quot;), ~ is.na(.))) %&gt;%
  7. select(-contains(&quot;.right&quot;))
  8. #&gt; # A tibble: 2 &#215; 4
  9. #&gt; var1 var2 var3 var4
  10. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;
  11. #&gt; 1 r1 apple 1 yes
  12. #&gt; 2 r5 lime 5 no

<sup>Created on 2023-05-22 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月23日 01:39:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308695.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定