2023年6月5日 22:20:43go评论86阅读模式

英文:

dplyr solution to exact and partial-string join

问题

我需要通过两列将两个数据集连接起来，其中一列是精确匹配，另一列是部分匹配：

pf=data.frame('exact'=c('s1','s2','s3','s2','s4'),'id_part'=c('a','a','a','b','c'), 'value'=c(1,2,3,4,5))
cj=data.frame('exact'=c('s1','s1','s4','s2','s4'), 'id_part'=c('saf','r2a@ff','k5-a','6b4-d','ab1'))

期望的结果应该是一个与cj具有相同行数的数据集，以及一个额外的列，其中的值来自于pf，以这样的方式进行匹配：如果 pf$exact==cj$exact & pf$id_part %in% cj$id_part，那么 output$value<- pf$value，否则 output$value<-0。

output <- merge(cj, pf, by.x=c('exact', 'id_part'), by.y=c('exact', 'id_part'), all.x=TRUE)
output[is.na(output$value), 'value'] <- 0

即exact列精确匹配，id_part列部分匹配。我尝试过使用stringdist_inner_join(cj, pf, by=c('exact', 'id_part'), method='lv')等方法，但没有成功。

英文:

I need to join two datasets by two columns, one column by an exact match and the other one by a partial match:

pf=data.frame(&#39;exact&#39;=c(&#39;s1&#39;,&#39;s2&#39;,&#39;s3&#39;,&#39;s2&#39;,&#39;s4&#39;),&#39;id_part&#39;=c(&#39;a&#39;,&#39;a&#39;,&#39;a&#39;,&#39;b&#39;,&#39;c&#39;), &#39;value&#39;=c(1,2,3,4,5))
&gt; pf
  exact id_part value
1    s1       a     1
2    s2       a     2
3    s3       a     3
4    s2       b     4
5    s4       c     5

and

cj=data.frame(&#39;exact&#39;=c(&#39;s1&#39;,&#39;s1&#39;,&#39;s4&#39;,&#39;s2&#39;,&#39;s4&#39;), &#39;id_part&#39;=c(&#39;saf&#39;,&#39;r2a@ff&#39;,&#39;k5-a&#39;,&#39;6b4-d&#39;,&#39;ab1&#39;))
&gt; cj
  exact id_part
1    s1     saf
2    s1  r2a@ff
3    s4    k5-a
4    s2   6b4-d
5    s4     ab1

the desired outcome should be a datasets with the same rows as cj plus an additional columns with values form pf, in a way that if pf$exact==cj$exact & pf$id_part %in% cj$id_part,then output$value<- pf$value, else output$value<-0:

&gt; output
  exact id_part value
1    s1     saf     1
2    s1  r2a@ff     1
3    s4    k5-a     0
4    s2   6b4-d     4
5    s4     ab1     0

i.e. an exact match on the exact column and a partial match on the id_part column. I tried with stringdist_inner_join(cj,pf, by=c('exact','id_part'), method='lv') and similar, but got nowhere.

any help appreciated.

答案1

得分: 0

你可以使用 regex_left_join：

library(fuzzyjoin)
library(dplyr)
library(tidyr)
regex_left_join(cj, pf) %>%
  mutate(value = replace_na(value, 0))
#   exact.x id_part.x exact.y id_part.y value
# 1      s1       saf      s1         a     1
# 2      s1    r2a@ff      s1         a     1
# 3      s4      k5-a    &lt;NA&gt;      &lt;NA&gt;     0
# 4      s2     6b4-d      s2         b     4
# 5      s4       ab1    &lt;NA&gt;      &lt;NA&gt;     0

英文:

You can use regex_left_join:

library(fuzzyjoin)
library(dplyr)
library(tidyr)
regex_left_join(cj, pf) |&gt;
  mutate(value = replace_na(value, 0))
#   exact.x id_part.x exact.y id_part.y value
# 1      s1       saf      s1         a     1
# 2      s1    r2a@ff      s1         a     1
# 3      s4      k5-a    &lt;NA&gt;      &lt;NA&gt;     0
# 4      s2     6b4-d      s2         b     4
# 5      s4       ab1    &lt;NA&gt;      &lt;NA&gt;     0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

dplyr解决方案以精确和部分字符串连接方式

问题

答案1

在R中，使用一个函数引用另一个数据框，向数据框添加一列。

将自定义包的所有函数导出到一个文件中。

Importing a .txt file into R.

R Shiny复选框不会随observeEvent更新。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。