使用 gsub 条件替换可选组

huangapple go评论65阅读模式
英文:

Conditional replacement of optional group with gsub

问题

以下是您要翻译的内容:

"A user asked me how to do this in https://stackoverflow.com/questions/76054997/how-to-italicize-select-words-in-a-ggplot-legend/76055093?noredirect=1#comment134133550_76055093, and I'm not happy with my workaround.

The aim is to add enclosing * around all character vector elements except for given strings. Let's assume for this example that those would always be found at the beginning. I am using an optional capture for the first group and then include the second group with the asterisks. The problem arises when the searched word stands alone and there is no following string.

I've included the desired output and some attempts in the code.

v <- head(rownames(mtcars))
## does also not work with (.*)?, nor with (.+) nor (.+)? gsub("(Hornet |Valiant)?(.*)", "\\\*\\\*", v) 
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant**"

## desired output
ifelse(grepl("Valiant", v), v, gsub("(Hornet )?(.*)", "\\\*\\\*", v) )
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"
英文:

A user asked me how to do this in https://stackoverflow.com/questions/76054997/how-to-italicize-select-words-in-a-ggplot-legend/76055093?noredirect=1#comment134133550_76055093, and I'm not happy with my workaround.

The aim is to add enclosing * around all character vector elements except for given strings. Let's assume for this example that those would always be found at the beginning. I am using an optional capture for the first group and then include the second group with the asterisks. The problem arises when the searched word stands alone and there is no following string.

I've included the desired output and some attempts in the code.

v <- head(rownames(mtcars))
## does also not work with (.*)?, nor with (.+) nor (.+)?
gsub("(Hornet |Valiant)?(.*)", "\\\*\\\*", v) 
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant**"

## desired output
ifelse(grepl("Valiant", v), v, gsub("(Hornet )?(.*)", "\\\*\\\*", v) )
#> [1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
#> [4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"

答案1

得分: 3

gsub 函数支持的正则表达式引擎中,没有一种能够支持条件替换模式。

你可以使用:

v <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive","Hornet Sportabout","Valiant")
gsub("^(?:Hornet|Valiant)\\s*(*SKIP)(*F)|(.+)", "*\*", v, perl=TRUE)

请查看 regex demoR demo online

输出:

[1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
[4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"   

要确保第一个单词完全匹配,请添加 \b"^(?:Hornet|Valiant)\\b\\s*(*SKIP)(*F)|(.+)"

请确保使用 perl=TRUE

正则表达式详解

  • ^(?:Hornet|Valiant)\s*(*SKIP)(*F) - 匹配字符串开头的 HornetValiant,然后零个或多个空格,并一旦匹配,丢弃并失败匹配,并继续从失败位置查找下一个匹配。

  • | - 或

  • (.+) - 匹配除换行符之外的一个或多个字符,尽可能多(字符串的剩余部分)。

英文:

Neither of the regex engines that can be used with gsub support a conditional replacement pattern.

You can use

v <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive","Hornet Sportabout","Valiant")
gsub("^(?:Hornet|Valiant)\\s*(*SKIP)(*F)|(.+)", "*\*", v, perl=TRUE)

See the regex demo and the R demo online.

Output:

[1] "*Mazda RX4*"         "*Mazda RX4 Wag*"     "*Datsun 710*"       
[4] "Hornet *4 Drive*"    "Hornet *Sportabout*" "Valiant"   

To make sure the first words are matched as whole words add \b: "^(?:Hornet|Valiant)\\b\\s*(*SKIP)(*F)|(.+)".

Make sure to use the perl=TRUE.

Regex details:

  • ^(?:Hornet|Valiant)\s*(*SKIP)(*F) - match Hornet or Valiant at the start of the string, then zero or more whitespaces, and once matched, discard and fail the match, and proceed to look for the next match from the failure position
  • | - or
  • (.+) - matches one or more chars other than line break chars as many as possible (the rest of the string).

答案2

得分: 3

One more solution is to use possessive quantifier for the first group and one-or-more inside the second:

^(Hornet ?|Valiant ?)?+(.+)

This way if Hornet or Valiant were matched in the beginning of the string - no backtracking will occur, and string will be matched (and subsequently substituted) only if there is something after those.

Demo here.

英文:

One more solution is to use possessive quantifier for first group and one-or-more inside of second:

^(Hornet ?|Valiant ?)?+(.+)

This way if Hornet or Valiant were matched in the beginning of the string - no backtracking will occur, and string will be matched (ans subsequently substituted) only if there is something after those.

Demo here.

答案3

得分: 2

gsub 只有在字符串匹配提供的正则表达式时才执行替换。因此,要阻止 * 出现,你可以使正则表达式不匹配你的输入。

例如,在问题中提供的示例中,你可以使用负向先行断言来实现。结果如下:

^(?!(?:Hornet|Valiant)$)(Hornet|Valiant)?(.*)$

演示链接 这里

英文:

Less in depth and more hacky answer, but easier to understand one)

gsub executes substitution only when string matches provided regex. So to stop * from appearing you can make regex stop matching your input.

For example provided in question you can do it with negative lookahead. Result would look like this:

^(?!(?:Hornet|Valiant)$)(Hornet|Valiant)?(.*)$

Demo here.

huangapple
  • 本文由 发表于 2023年4月19日 22:34:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055755.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定