英文:
Conditional replacement of optional group with gsub
问题
以下是您要翻译的内容:
"A user asked me how to do this in https://stackoverflow.com/questions/76054997/how-to-italicize-select-words-in-a-ggplot-legend/76055093?noredirect=1#comment134133550_76055093, and I'm not happy with my workaround.
The aim is to add enclosing * around all character vector elements except for given strings. Let's assume for this example that those would always be found at the beginning. I am using an optional capture for the first group and then include the second group with the asterisks. The problem arises when the searched word stands alone and there is no following string.
I've included the desired output and some attempts in the code.
v <- head(rownames(mtcars))
## does also not work with (.*)?, nor with (.+) nor (.+)? gsub("(Hornet |Valiant)?(.*)", "\\\*\\\*", v)
#> [1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
#> [4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant**"
## desired output
ifelse(grepl("Valiant", v), v, gsub("(Hornet )?(.*)", "\\\*\\\*", v) )
#> [1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
#> [4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant"
英文:
A user asked me how to do this in https://stackoverflow.com/questions/76054997/how-to-italicize-select-words-in-a-ggplot-legend/76055093?noredirect=1#comment134133550_76055093, and I'm not happy with my workaround.
The aim is to add enclosing * around all character vector elements except for given strings. Let's assume for this example that those would always be found at the beginning. I am using an optional capture for the first group and then include the second group with the asterisks. The problem arises when the searched word stands alone and there is no following string.
I've included the desired output and some attempts in the code.
v <- head(rownames(mtcars))
## does also not work with (.*)?, nor with (.+) nor (.+)?
gsub("(Hornet |Valiant)?(.*)", "\\\*\\\*", v)
#> [1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
#> [4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant**"
## desired output
ifelse(grepl("Valiant", v), v, gsub("(Hornet )?(.*)", "\\\*\\\*", v) )
#> [1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
#> [4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant"
答案1
得分: 3
gsub
函数支持的正则表达式引擎中,没有一种能够支持条件替换模式。
你可以使用:
v <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive","Hornet Sportabout","Valiant")
gsub("^(?:Hornet|Valiant)\\s*(*SKIP)(*F)|(.+)", "*\*", v, perl=TRUE)
请查看 regex demo 和 R demo online。
输出:
[1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
[4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant"
要确保第一个单词完全匹配,请添加 \b
:"^(?:Hornet|Valiant)\\b\\s*(*SKIP)(*F)|(.+)"
。
请确保使用 perl=TRUE
。
正则表达式详解:
-
^(?:Hornet|Valiant)\s*(*SKIP)(*F)
- 匹配字符串开头的Hornet
或Valiant
,然后零个或多个空格,并一旦匹配,丢弃并失败匹配,并继续从失败位置查找下一个匹配。 -
|
- 或 -
(.+)
- 匹配除换行符之外的一个或多个字符,尽可能多(字符串的剩余部分)。
英文:
Neither of the regex engines that can be used with gsub
support a conditional replacement pattern.
You can use
v <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive","Hornet Sportabout","Valiant")
gsub("^(?:Hornet|Valiant)\\s*(*SKIP)(*F)|(.+)", "*\*", v, perl=TRUE)
See the regex demo and the R demo online.
Output:
[1] "*Mazda RX4*" "*Mazda RX4 Wag*" "*Datsun 710*"
[4] "Hornet *4 Drive*" "Hornet *Sportabout*" "Valiant"
To make sure the first words are matched as whole words add \b
: "^(?:Hornet|Valiant)\\b\\s*(*SKIP)(*F)|(.+)"
.
Make sure to use the perl=TRUE
.
Regex details:
^(?:Hornet|Valiant)\s*(*SKIP)(*F)
- matchHornet
orValiant
at the start of the string, then zero or more whitespaces, and once matched, discard and fail the match, and proceed to look for the next match from the failure position|
- or(.+)
- matches one or more chars other than line break chars as many as possible (the rest of the string).
答案2
得分: 3
One more solution is to use possessive quantifier for the first group and one-or-more inside the second:
^(Hornet ?|Valiant ?)?+(.+)
This way if Hornet
or Valiant
were matched in the beginning of the string - no backtracking will occur, and string will be matched (and subsequently substituted) only if there is something after those.
Demo here.
英文:
One more solution is to use possessive quantifier for first group and one-or-more inside of second:
^(Hornet ?|Valiant ?)?+(.+)
This way if Hornet
or Valiant
were matched in the beginning of the string - no backtracking will occur, and string will be matched (ans subsequently substituted) only if there is something after those.
Demo here.
答案3
得分: 2
gsub
只有在字符串匹配提供的正则表达式时才执行替换。因此,要阻止 *
出现,你可以使正则表达式不匹配你的输入。
例如,在问题中提供的示例中,你可以使用负向先行断言来实现。结果如下:
^(?!(?:Hornet|Valiant)$)(Hornet|Valiant)?(.*)$
演示链接 这里。
英文:
Less in depth and more hacky answer, but easier to understand one)
gsub
executes substitution only when string matches provided regex. So to stop *
from appearing you can make regex stop matching your input.
For example provided in question you can do it with negative lookahead. Result would look like this:
^(?!(?:Hornet|Valiant)$)(Hornet|Valiant)?(.*)$
Demo here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论