使用正则表达式重新排列字符串

huangapple go评论67阅读模式
英文:

Rearrange a string using regex

问题

我有一个字符串表达式,如下所示:

orig <- "mean(Sepal.Length, na.rm = TRUE)"
orig
#> [1] "mean(Sepal.Length, na.rm = TRUE)"

我想重新排列这个字符串,以获得以下输出:

"Sepal.Length$mean(na.rm = TRUE)"
#> [1] "Sepal.Length$mean(na.rm = TRUE)"

我知道可以使用捕获组来实现:

gsub("(Sepal.Length)", "\\\$", orig)
#> [1] "mean(Sepal.Length$, na.rm = TRUE)"

但这不适用于移动字符串中的文本:

gsub("(Sepal.Length)(.*)", "\\\$\", orig)
#> [1] "mean(Sepal.Length$, na.rm = TRUE)"

这个问题对我有所帮助,但那里的解决方案是硬编码的,而在这里,我根本不知道将会有什么表达式,只知道它将包含 Sepal.Length。上面的表达式例如可以是 "sum(Sepal.Length)"

我正在寻找 仅使用基本的 R 方法 的解决方案。

英文:

I have an expression as a string, like the following:

orig <- "mean(Sepal.Length, na.rm = TRUE)"
orig
#> [1] "mean(Sepal.Length, na.rm = TRUE)"

I would like to rearrange this string so that I get the following output:

"Sepal.Length$mean(na.rm = TRUE)"
#> [1] "Sepal.Length$mean(na.rm = TRUE)"

I know that I can use capture groups like this:

gsub("(Sepal.Length)", "\\\$", orig)
#> [1] "mean(Sepal.Length$, na.rm = TRUE)"

but this doesn’t work to move text in the string:

gsub("(Sepal.Length)(.*)", "\\\$\", orig)
#> [1] "mean(Sepal.Length$, na.rm = TRUE)"

This question was helpful but the solution there is hardcoded, whereas here I don't know the expression I will have at all, just that it will contain Sepal.Length. The expression above could be "sum(Sepal.Length)" for example.

I’m looking for a solution in base R only.

答案1

得分: 2

你可以使用以下模式:

gsub("(.+)\\(Sepal\\.Length,? *(.*)\\)", "Sepal.Length$\(\)", orig)
  • (.+) 匹配第一个括号 \\( 之前的任何内容;
  • 然后,我们始终有 "Sepal.Length"。注意,. 是一个特殊字符,所以要使用字面点需要 \\.
  • 然后,我们可能有一个逗号和空格 ,? *? 表示“0次或1次”,* 表示“0次或多次”);
  • 之后,我们可能有其他参数 (.*),然后是右括号 \\)

编辑:感谢 @rps1227 提出的改进建议。

英文:

You can use the following pattern:

gsub("(.+)\\(Sepal\\.Length,? *(.*)\\)", "Sepal.Length$\(\)", orig)
  • (.+) matches anything before the first parenthesis \\(;
  • Then, we always have "Sepal.Length". Note that . is a special character, so to use a literal dot you need \\.;
  • We then might have a comma, and space(s) ,? * (? means "0 or 1 times", and * means "0 or more times");
  • After that we might have the other arguments (.*), followed by the closing parenthesis \\).

Edit: Thanks for @rps1227 for the suggested improvements.

答案2

得分: 2

分析给定的表达式,其中 `p` 的情况是,在第一个示例中 p[[2]] 为 `Sepal.Length`,但在第二个示例中可能是其他内容。然后将 p 转换为一个列表,并将第二个元素置为 NULL,在第一个示例中这个元素是 `Sepal.Length`,然后将其转换为一个调用对象,再从中转换为一个字符串。最后,使用 $ 作为分隔符将 p[[2]] 粘贴到其前面。不使用包或正则表达式,并且不依赖于第一个参数名是否为 `Sepal.Length` 或其他内容。

f <- function(orig) {
  p <- str2lang(orig)
  paste(p[[2]], format(as.call(replace(as.list(p), 2, NULL))), sep = "$")
}

orig <- "mean(Sepal.Length, na.rm = TRUE)"
f(orig)
## [1] "Sepal.Length$mean(na.rm = TRUE)"

orig2 <- "sum(Sepal.Width)"
f(orig2)
## [1] "Sepal.Width$sum()"
英文:

Parse the expression giving p in which case p[[2]] is Sepal.Length in the first example but could be something else as in the second exxample. Then convert p to a list and NULL out the 2nd element which is Sepal.Length in the first example and then convert that to a call object and from that to a string. Finally paste p[[2]] onto the front of it using $ as the separator. No packages or regular expressions are used and it is independent of whether the first argument name is Sepal.Length or something else.

f &lt;- function(orig) {
  p &lt;- str2lang(orig)
  paste(p[[2]], format(as.call(replace(as.list(p), 2, NULL))), sep = &quot;$&quot;)
}

orig &lt;- &quot;mean(Sepal.Length, na.rm = TRUE)&quot;
f(orig)
## [1] &quot;Sepal.Length$mean(na.rm = TRUE)&quot;

orig2 &lt;- &quot;sum(Sepal.Width)&quot;
f(orig2)
## [1] &quot;Sepal.Width$sum()&quot;

答案3

得分: 1

如果您提前知道“Sepal.Length”,那么您不需要正则表达式将其附加在前面,您可以使用“paste”将其放在那里:

pattern = "Sepal.Length"

result = sub(pattern = paste0(pattern, ", "), replacement = "", x = orig, fixed = TRUE)
result = paste0(pattern, "$", result)
result
# [1] "Sepal.Length$mean(na.rm = TRUE)"
英文:

If you know &quot;Sepal.Length&quot; in advance then you don't need regex to stick it on the front, you can paste it there:

pattern = &quot;Sepal.Length&quot;

result = sub(pattern = paste0(pattern, &quot;, &quot;), replacement = &quot;&quot;, x = orig, fixed = TRUE)
result = paste0(pattern, &quot;$&quot;, result)
result
# [1] &quot;Sepal.Length$mean(na.rm = TRUE)&quot;

huangapple
  • 本文由 发表于 2023年6月5日 22:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407339.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定