匹配非有效的C++标识符字符以重命名变量

huangapple go评论65阅读模式
英文:

Sed regexp Match only non-valid c++ identifier characters to rename a variable

问题

我想使用sed来重命名变量名(标识符)。我想在C++中这样做,但对于其他编程语言,方法类似。假设我们有一个如下的代码示例:

example.cpp

int hi;
int bye;
...//很多带有多个n的代码

假设出于某种原因,我想将hi重命名为hello。问题是hi可能会出现在其他单词中的一部分。在C++中,有效的标识符具有以下规则:[[[:alpha:]]_]+[[[:alnum:]]_](不考虑扩展字符像 ä。我不知道alnum是否包括它们,但如果它们没有问题,除了扩展的标点符号字符,可能会有问题,但是谁会使用它们)

在有效标识符旁边必须有一个不属于这个表达式的字符,以将其与其他标识符区分开。因此,在n之前和之后,不允许有[[[:alnum:]]_],但其他任何字符都可以。另一个问题是在""中的字符串。这只在字符串始终为单行时才有效。然后,我们必须检查未转义的"的奇数次出现,如果可以使用正则表达式来执行此操作,那可能是一个数学问题,不过我在第一次尝试时还没有尝试过字符串识别:

sed -i -e 's/\([^[:alnum:]_]\)hi\([^[:alnum:]_]\)/hello/g' example.cpp

它没有改变任何东西。

英文:

I want to use sed to rename variable names (identifiers). I want to do it for c++ however for other languages it will be similar. Say we have a code sample like that here:
example.cpp

int hi;
int bye;
...//a lot of code with many occurences of n

Assumed for any reason I want to rename hi in hello. The problem is hi can occur as a part of other words. In C++ a valid identifiers has the following receipt :[[[:alpha:]]_]+[[[:alnum:]]_]
(Putting extended characters like ä or aside. I do not know if alnum includes these but if they are no problem expect extended punctuation characters maybe but who uses them)

There must be a character not pertaining to this expression next to a valid identifier to distinguish it from other identifiers. So before and behind n an [[[:alnum:]]_] is not allowed while any other character may. Another problem are string in "". This all only works if strings are always on-liners. Then we must check for odd occurences of unescpaped " and it may be a mathematical issue if we can do this with regular expressions however I did not come to this point trying this the first time without string recognising:

sed -i -e 'hi/\([^[[:alnum:]]_]\)hello\([^[[:alnum:]]_]\)/r/g' example.cpp

It doesnt changed anything

答案1

得分: 1

你的sed命令存在问题 - 没有s///替换。

无论如何,你只需要在替换的匹配部分使用单词边界 (\b):

sed 's/\bhi\b/hello/' example.cpp

上述命令几乎与以下命令相同:

sed -E 's/([^[:alnum:]_])hi([^[:alnum:]_])/hello/' example.cpp

唯一的区别是上面的命令依赖于匹配组的大小非零。

关于单词边界的更多讨论在这里。

注意你的字符类别中有不必要的方括号。[[:alnum:]] 的否定是 [^[:alnum:]],所以你的非单词字符类应该是 [^[:alnum:]_]。而这等同于扩展正则表达式 (ERE) 中的 \W,所以你也可以使用 sed -E 这样做:

sed -E 's/(\W)hi(\W)/hello/' example.cpp

再次强调,hi 前后必须有非单词字符(这对于C变量可能是一个合理的假设)。

为了修复这一点,你可以将行的开头 ^ 和结尾 $ 的情况也添加到其中,这样可以允许在这些情况下进行零大小的匹配:

sed -E '(^|\W)hi(\W|$)/hello/' example.cpp

(上面的命令可能完全正常运行,与 sed 's/\bhi\b/hello/' 一样)

或者你可以使用Perl正则表达式(PCRE)使匹配组成为非消耗性的前瞻 (?<=) 和后瞻 (?=)

perl -pe 's/(?<=\W)hi(?=\W)/hello/' example.cpp

与此相同,反转字符组并否定前瞻和后瞻:

perl -pe 's/(?!\w)hi(?!\w)/hello/' example.cpp

当你逐渐使用GNU正则表达式的功能集时,你可以使用grep来测试所有匹配:

$ grep --color '\bhi\b' example.cpp
$ grep -E --color '(^|\W)hi(\W|$)' example.cpp
$ grep -P --color '(?!\w)hi(?!\w)' example.cpp

因此,你将看到hi在基本、扩展(ERE)和Perl(PCRE)正则表达式中都以颜色高亮显示,所有这些都由grep支持。(上面的ERE还会高亮显示非单词字符,如果有的话)

但是,所有正则表达式样式都支持方便的单词边界 \b 的零大小匹配。所以,尽管如此。

英文:

Your sed is garbled -- there's no s/// substitution.

Anyways all that you need are word boundaries (\b) in the match side of the substitution:

sed &#39;s/\bhi\b/hello/&#39; example.cpp

Above does almost the same as this:

sed -E &#39;s/([^[:alnum:]_])hi([^[:alnum:]_])/hello/&#39; example.cpp

... except that above depends upon the match groups being nonzero size.

More discussion of word boundary here.

Note also that your character classes have more square brackets than needed. The negation of [[:alnum:]] is [^[:alnum:]], so your non-word character class should be [^[:alnum:]_]. And that is equivalent to \W in extended regexp (ERE), so you can also do this with sed -E:

sed -E &#39;s/(\W)hi(\W)/hello/&#39; example.cpp 

... again with the caveat that hi has to have a nonword character before or after (which is maybe a safe assumption for a C variable).

To fix that, you can add the line beginning ^ and end $ cases to this, too, which allows a zero-size match in those cases:

sed -E &#39;s/(^|\W)hi(\W|$)/hello/&#39; example.cpp 

(Above likely works perfectly well, same as sed &#39;s/\bhi\b/hello/&#39;)

Or you can use perl regex (PCRE) to make the match groups nonconsuming lookbehind (?&lt;=) and lookahead (?=):

perl -pe &#39;s/(?&lt;=\W)hi(?=\W)/hello/&#39; example.cpp

Same as this, inverting the char groups and negating the lookbehind and lookahead:

perl -pe &#39;s/(?&lt;!\w)hi(?!\w)/hello/&#39; example.cpp

As you climb the scale of GNU regex feature set, you could test the matching for all with grep:

$ grep --color &#39;\bhi\b&#39; example.cpp
$ grep -E --color &#39;(^|\W)hi(\W|$)&#39; example.cpp
$ grep -P --color &#39;(?&lt;!\w)hi(?!\w)&#39; example.cpp

... so you will see hi highlighted in color using basic, extended (ERE), and perl (PCRE) regex, all supported by grep. (The ERE above also highlights the nonword chars, if any, before or after)

But all regexp styles support the always-convenient zero-size match of \b for word boundaries. So, use it.

huangapple
  • 本文由 发表于 2023年8月4日 01:08:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定