英文:
R regex to get partly match
问题
I want to use stri_replace_all_regex to replace string but failed. I would like to know whether there are other methods to overcome it.
Thanks for anyone who gives help to me!
try:
the first:
> library(string)
> a <- c('abc2','xycd2','mnb345','tumb b~','lymavc')
> b <- c('ab','abc','xyc','mnb','tum','mn','tumb','lym','lymav')
> stri_replace_all_regex(a, "\\b" %s+% b %s+% "\\S+", b, vectorize_all=FALSE)
However, the result is :
> c("ab","xyc","mn" ,"tum b~","lym")
which is not I want.
I want the result should be:
> c('abc','xyc','mnb','tumb','lymac')
the second:
> pattern <- paste0("\\b(", b, ")\\S+", collapse = "|")
> gsub(pattern, "\\w", a)
However it failed.
I feel sorry it's my mistake that I do not express clearly.
In fact, I want to replace b
with a
.
As you see, a
and b
have some similar parts on the left, I want to remove the difference from a
. But should be greedy match.
For example:
The result of 'tumb b~‘
should be 'thumb'
not 'tum'
and the result of 'mnb345‘
should be 'mnb'
not 'mn'
.
I just learn regex expression, so my try may be complex and cumbersome. Looking forward for your reply!
A new question occurs.
> a <- c('tums310','tums310~20','tums320')
> b<-c('tums1','tums2','tums3')
I want the result should be
> "tums3" "tums3" "tums3"
英文:
I want to use stri_replace_all_regex to replace string but failed. I would like to know whether there are other methods to overcome it.
Thanks for anyone who gives help to me!
try:
the first:
> library(string)
> a <- c('abc2','xycd2','mnb345','tumb b~','lymavc')
> b <- c('ab','abc','xyc','mnb','tum','mn','tumb','lym','lymav')
> stri_replace_all_regex(a, "\\b" %s+% b %s+% "\\S+", b, vectorize_all=FALSE)
However, the result is :
> c("ab","xyc","mn" ,"tum b~","lym")
which is not I want.
I want the result should be:
> c('abc','xyc','mnb','tumb','lymac')
the second:
> pattern <- paste0("\\b(", b, ")\\S+", collapse = "|")
> gsub(pattern, "\\w", a)
However it failed.
I feel sorry it's my mistake that I do not express clearly.
In fact, I want to replace b
with a
.
As you see, a
and b
have some similar parts on the left, I want to remove the difference from a
. But should be greedy match.
For example:
The result of 'tumb b~‘
should be 'thumb'
not 'tum'
and the result of 'mnb345‘
should be 'mnb'
not 'mn'
.
I just learn regex expresion, so my try may be complex and cumbersome. Looking forward for your reply!
A new questions occurs.
> a <- c('tums310','tums310~20','tums320')
> b<-c('tums1','tums2','tums3')
I want the result should be
> "tums3" "tums3" "tums3"
答案1
得分: 2
也许您正在寻找 adist
。
a <- c('abc2','xycd2','mnb345','tumb b~','lymavc')
b <- c('ab','abc','xyc','mnb','tum','mn','tumb','lym','lymav')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "abc" "xyc" "mnb" "tumb" "lymav"
a <- c('tums310','tums310~20','tums320')
b <- c('tums1','tums2','tums3')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "tums3" "tums3" "tums3"
英文:
Maybe you are looking for adist
.
a <- c('abc2','xycd2','mnb345','tumb b~','lymavc')
b <- c('ab','abc','xyc','mnb','tum','mn','tumb','lym','lymav')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "abc" "xyc" "mnb" "tumb" "lymav"
a <- c('tums310','tums310~20','tums320')
b <- c('tums1','tums2','tums3')
b[apply(adist(b, a) + adist(b, a, partial=TRUE), 2, which.min)]
#[1] "tums3" "tums3" "tums3"
答案2
得分: 0
以下是使用stringdist_join
函数的fuzzy_join
解决方案:
library(fuzzyjoin)
stringdist_join(
# 将`b`作为数据框与...
data.frame(b),
# ... 以数据框形式连接`a`:
data.frame(a),
# 通过...连接:
by = c("b" = "a"),
# 使用左连接:
mode = 'left',
# 使用Jaro-Winkler距离度量:
method = "jw",
# 启用不区分大小写的匹配:
ignore_case = TRUE,
# 距离列的名称:
distance_col = 'dist') %>%
# 保留最接近的匹配项:
group_by(a) %>%
slice_min(order_by = dist, n = 1)
# 一个tibble:5 × 3
# 组:a [5]
b a dist
<chr> <chr> <dbl>
1 abc abc2 0.0833
2 lymav lymavc 0.0556
3 mnb mnb345 0.167
4 tumb tumb b~ 0.143
5 xyc xycd2 0.133
b
现在包含了与a
最接近的匹配值。
英文:
Here's a fuzzy_join
solution with the function stringdist_join
:
library(fuzzyjoin)
stringdist_join(
# join `b` as a dataframe ...
data.frame(b),
# ... with `a` as a dataframe:
data.frame(a),
# join by ...:
by = c("b" = "a")
# use left join:
mode = 'left',
# use Jaro-Winkler distance metric:
method = "jw",
# enable case-insensitive matching:
ignore_case = TRUE,
# name for distance column:
distance_col = 'dist') %>%
# retain only closest matches:
group_by(a) %>%
slice_min(order_by = dist, n = 1)
# A tibble: 5 × 3
# Groups: a [5]
b a dist
<chr> <chr> <dbl>
1 abc abc2 0.0833
2 lymav lymavc 0.0556
3 mnb mnb345 0.167
4 tumb tumb b~ 0.143
5 xyc xycd2 0.133
b
contains now the most closely matching values for a
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论