2023年2月14日 09:08:06go评论70阅读模式

英文:

R regex to get partly match

问题

I want to use stri_replace_all_regex to replace string but failed. I would like to know whether there are other methods to overcome it.
Thanks for anyone who gives help to me!

try:
the first:

&gt; library(string)
&gt; a &lt;- c(&#39;abc2&#39;,&#39;xycd2&#39;,&#39;mnb345&#39;,&#39;tumb b~&#39;,&#39;lymavc&#39;) 
&gt; b &lt;- c(&#39;ab&#39;,&#39;abc&#39;,&#39;xyc&#39;,&#39;mnb&#39;,&#39;tum&#39;,&#39;mn&#39;,&#39;tumb&#39;,&#39;lym&#39;,&#39;lymav&#39;) 
&gt; stri_replace_all_regex(a, &quot;\\b&quot; %s+% b %s+% &quot;\\S+&quot;, b, vectorize_all=FALSE)

However, the result is :

&gt; c(&quot;ab&quot;,&quot;xyc&quot;,&quot;mn&quot; ,&quot;tum b~&quot;,&quot;lym&quot;)

which is not I want.
I want the result should be:

&gt; c(&#39;abc&#39;,&#39;xyc&#39;,&#39;mnb&#39;,&#39;tumb&#39;,&#39;lymac&#39;)

the second:

&gt; pattern &lt;- paste0(&quot;\\b(&quot;, b, &quot;)\\S+&quot;, collapse = &quot;|&quot;)
&gt; gsub(pattern, &quot;\\w&quot;, a)

However it failed.
I feel sorry it's my mistake that I do not express clearly.
In fact, I want to replace b with a.
As you see, a and b have some similar parts on the left， I want to remove the difference from a. But should be greedy match.
For example:
The result of 'tumb b~‘ should be 'thumb' not 'tum' and the result of 'mnb345‘ should be 'mnb' not 'mn'.
I just learn regex expression, so my try may be complex and cumbersome. Looking forward for your reply!

A new question occurs.

> a <- c('tums310','tums310~20','tums320')
> b<-c('tums1','tums2','tums3')

I want the result should be
> "tums3" "tums3" "tums3"

英文:

I want to use stri_replace_all_regex to replace string but failed. I would like to know whether there are other methods to overcome it.
Thanks for anyone who gives help to me!

try:
the first:

&gt; library(string)
&gt; a &lt;- c(&#39;abc2&#39;,&#39;xycd2&#39;,&#39;mnb345&#39;,&#39;tumb b~&#39;,&#39;lymavc&#39;) 
&gt; b &lt;- c(&#39;ab&#39;,&#39;abc&#39;,&#39;xyc&#39;,&#39;mnb&#39;,&#39;tum&#39;,&#39;mn&#39;,&#39;tumb&#39;,&#39;lym&#39;,&#39;lymav&#39;) 
&gt; stri_replace_all_regex(a, &quot;\\b&quot; %s+% b %s+% &quot;\\S+&quot;, b, vectorize_all=FALSE)

However, the result is :

&gt; c(&quot;ab&quot;,&quot;xyc&quot;,&quot;mn&quot; ,&quot;tum b~&quot;,&quot;lym&quot;)

which is not I want.
I want the result should be:

&gt; c(&#39;abc&#39;,&#39;xyc&#39;,&#39;mnb&#39;,&#39;tumb&#39;,&#39;lymac&#39;)

the second:

&gt; pattern &lt;- paste0(&quot;\\b(&quot;, b, &quot;)\\S+&quot;, collapse = &quot;|&quot;)
&gt; gsub(pattern, &quot;\\w&quot;, a)

However it failed.
I feel sorry it's my mistake that I do not express clearly.
In fact, I want to replace b with a.
As you see, a and b have some similar parts on the left， I want to remove the difference from a. But should be greedy match.
For example:
The result of 'tumb b~‘ should be 'thumb' not 'tum' and the result of 'mnb345‘ should be 'mnb' not 'mn'.
I just learn regex expresion, so my try may be complex and cumbersome. Looking forward for your reply!

A new questions occurs.

> a <- c('tums310','tums310~20','tums320')
> b<-c('tums1','tums2','tums3')

I want the result should be
> "tums3" "tums3" "tums3"

答案1

得分: 2

也许您正在寻找 adist。

a <- c('abc2','xycd2','mnb345','tumb b~','lymavc') 
b <- c('ab','abc','xyc','mnb','tum','mn','tumb','lym','lymav')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "abc"   "xyc"   "mnb"   "tumb"  "lymav"

a <- c('tums310','tums310~20','tums320')  
b <- c('tums1','tums2','tums3')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "tums3" "tums3" "tums3"

英文:

答案2

得分: 0

以下是使用stringdist_join函数的fuzzy_join解决方案：

library(fuzzyjoin)
stringdist_join(
  # 将`b`作为数据框与...
  data.frame(b),
  # ... 以数据框形式连接`a`：
  data.frame(a),
  # 通过...连接：
  by = c("b" = "a"),
  # 使用左连接：
  mode = 'left',
  # 使用Jaro-Winkler距离度量：
  method = "jw",
  # 启用不区分大小写的匹配：
  ignore_case = TRUE,
  # 距离列的名称：
  distance_col = 'dist') %>%
# 保留最接近的匹配项：
group_by(a) %>%
slice_min(order_by = dist, n = 1)
# 一个tibble：5 × 3
# 组：a [5]
  b     a         dist
  <chr> <chr>    <dbl>
1 abc   abc2    0.0833
2 lymav lymavc  0.0556
3 mnb   mnb345  0.167 
4 tumb  tumb b~ 0.143 
5 xyc   xycd2   0.133

b现在包含了与a最接近的匹配值。

英文:

Here's a fuzzy_join solution with the function stringdist_join:

library(fuzzyjoin)
stringdist_join(
  # join `b` as a dataframe ... 
  data.frame(b),
  # ... with `a` as a dataframe:
  data.frame(a),
  # join by ...:
  by = c(&quot;b&quot; = &quot;a&quot;)
  # use left join:
  mode = &#39;left&#39;,
  # use Jaro-Winkler distance metric:
  method = &quot;jw&quot;,
  # enable case-insensitive matching:
  ignore_case = TRUE,
  # name for distance column:
  distance_col = &#39;dist&#39;) %&gt;% 
# retain only closest matches:
group_by(a) %&gt;%
  slice_min(order_by = dist, n = 1)
# A tibble: 5 &#215; 3
# Groups:   a [5]
  b     a         dist
  &lt;chr&gt; &lt;chr&gt;    &lt;dbl&gt;
1 abc   abc2    0.0833
2 lymav lymavc  0.0556
3 mnb   mnb345  0.167 
4 tumb  tumb b~ 0.143 
5 xyc   xycd2   0.133

b contains now the most closely matching values for a.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R正则表达式以获取部分匹配

问题

答案1

答案2

在R中变化线条粗细

Adding "empty space" to perimeter of ggplot2 plot

创建一个基于匹配字符串的新列。

从R中的smooth.spline检索/重现设计矩阵。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论