All matches replacing not modifier ‘g’ with sed is Why

huangapple go评论69阅读模式
英文:

Why is sed with 'g' modifier not replacing all matches

问题

sed 带有 'g' 修饰符未替换所有匹配项。

示例:

$ sed -e 's:foo\([^\s]*\): :g' <(echo "hey foobar foobar ")

期望结果:

"hey bar bar"

实际结果:

"hey bar foobar"

我错过了什么?

英文:

sed with 'g' modifier not replacing all matches.

Example:

$ sed -e 's:foo\([^\s]*\): :g' <(echo "hey foobar foobar ")

Expected:

"hey bar bar"

Actual:

"hey bar foobar"

What am I missing here?

答案1

得分: 5

\s 是 PCRE 扩展,POSIX 的 sed 不保证实现它;而且如果不受支持,匹配只会停在字母 s 或反斜杠处,而不会在空格处停止。

如果你改用以下方式,你将得到正确的行为:

sed -re 's:foo([^[:space:]]*): :g'

...或者...

sed -e 's:foo\([^[:space:]]*\): :g'

为了更好地看到自己在做什么,你可以在替换字符串周围加上标志,以查看确切捕获了什么:

$ sed -e 's:foo\([^\s]*\):< >:g' <<< "hey foobar foobar "
hey <bar foobar  >
$ sed -e 's:foo\([^\s]*\):< >:g' <<< "hey foobar foobar s foobar foobar "
hey <bar foobar  >s <bar foobar  >
英文:

\s is a PCRE extension that POSIX sed isn't guaranteed to implement; and without it being honored, your match stops only at the letter s or at backslashes, not at spaces.

You get correct behavior if instead you use:

sed -re &#39;s:foo([^[:space:]]*): :g&#39;

...or...

sed -e &#39;s:foo\([^[:space:]]*\): :g&#39;

To better see what's going on for your self, you can change the replacement string to have sigils around it, to see exactly what is being captured:

$ sed -e &#39;s:foo\([^\s]*\):&lt; &gt;:g&#39; &lt;&lt;&lt;&quot;hey foobar foobar &quot;
hey &lt;bar foobar  &gt;
$ sed -e &#39;s:foo\([^\s]*\):&lt; &gt;:g&#39; &lt;&lt;&lt;&quot;hey foobar foobar s foobar foobar &quot;
hey &lt;bar foobar  &gt;s &lt;bar foobar  &gt;

答案2

得分: 1

你的问题出在[^\s]*,这不是你以为的那样工作。它将查找任何不匹配\s的字符(它不会将\s扩展为列表内的任何空白字符 - 请参见下文)。因为没有匹配项,它会匹配所有的foobar foobar,并将其替换为bar foobar。请将其替换为[^ ]*或更好的是\S*

来自gnu手册页(注意其他版本的sed可能会有不同的行为)

字符$、、.、[和\通常在列表内不是特殊字符。
例如,[*]匹配‘\’或‘
’,因为\在这里不是特殊字符。但是,像[.ch.]、[=a=]和[:space:]这样的字符串在列表内是特殊的,代表整理符号、等效类和字符类,因此,在列表内,当其后跟随.、=或:时,[也是特殊的。此外,当不处于POSIXLY_CORRECT模式时,特殊的转义字符如\n和\t在列表内也会被识别。请参考转义字符。

英文:

Your problem is with the [^\s]* -- this doesn't do what you think it does. This will look for any character that doesn't match \ or s (it doesn't expand \s to be any whitespace inside of a list -- see below) Because there are no matches for this, it will match all of foobar foobar, and replace it with bar foobar. Replace this with [^ ]* or better yet \S*

From the gnu man page (note other versions of sed may act differently)

> The characters $, , ., [, and \ are normally not special within list.
> For example, [*] matches either ‘\’ or ‘
’, because the \ is not
> special here. However, strings like [.ch.], [=a=], and [:space:] are
> special within list and represent collating symbols, equivalence
> classes, and character classes, respectively, and [ is therefore
> special within list when it is followed by ., =, or :. Also, when not
> in POSIXLY_CORRECT mode, special escapes like \n and \t are recognized
> within list. See Escapes.

huangapple
  • 本文由 发表于 2023年6月1日 01:19:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375925.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定