英文:
Why is sed with 'g' modifier not replacing all matches
问题
sed 带有 'g' 修饰符未替换所有匹配项。
示例:
$ sed -e 's:foo\([^\s]*\): :g' <(echo "hey foobar foobar ")
期望结果:
"hey bar bar"
实际结果:
"hey bar foobar"
我错过了什么?
英文:
sed with 'g' modifier not replacing all matches.
Example:
$ sed -e 's:foo\([^\s]*\): :g' <(echo "hey foobar foobar ")
Expected:
"hey bar bar"
Actual:
"hey bar foobar"
What am I missing here?
答案1
得分: 5
\s
是 PCRE 扩展,POSIX 的 sed
不保证实现它;而且如果不受支持,匹配只会停在字母 s
或反斜杠处,而不会在空格处停止。
如果你改用以下方式,你将得到正确的行为:
sed -re 's:foo([^[:space:]]*): :g'
...或者...
sed -e 's:foo\([^[:space:]]*\): :g'
为了更好地看到自己在做什么,你可以在替换字符串周围加上标志,以查看确切捕获了什么:
$ sed -e 's:foo\([^\s]*\):< >:g' <<< "hey foobar foobar "
hey <bar foobar >
$ sed -e 's:foo\([^\s]*\):< >:g' <<< "hey foobar foobar s foobar foobar "
hey <bar foobar >s <bar foobar >
英文:
\s
is a PCRE extension that POSIX sed
isn't guaranteed to implement; and without it being honored, your match stops only at the letter s
or at backslashes, not at spaces.
You get correct behavior if instead you use:
sed -re 's:foo([^[:space:]]*): :g'
...or...
sed -e 's:foo\([^[:space:]]*\): :g'
To better see what's going on for your self, you can change the replacement string to have sigils around it, to see exactly what is being captured:
$ sed -e 's:foo\([^\s]*\):< >:g' <<<"hey foobar foobar "
hey <bar foobar >
$ sed -e 's:foo\([^\s]*\):< >:g' <<<"hey foobar foobar s foobar foobar "
hey <bar foobar >s <bar foobar >
答案2
得分: 1
你的问题出在[^\s]*
,这不是你以为的那样工作。它将查找任何不匹配\
或s
的字符(它不会将\s
扩展为列表内的任何空白字符 - 请参见下文)。因为没有匹配项,它会匹配所有的foobar foobar
,并将其替换为bar foobar
。请将其替换为[^ ]*
或更好的是\S*
。
来自gnu手册页(注意其他版本的sed可能会有不同的行为)
字符$、、.、[和\通常在列表内不是特殊字符。
例如,[*]匹配‘\’或‘’,因为\在这里不是特殊字符。但是,像[.ch.]、[=a=]和[:space:]这样的字符串在列表内是特殊的,代表整理符号、等效类和字符类,因此,在列表内,当其后跟随.、=或:时,[也是特殊的。此外,当不处于POSIXLY_CORRECT模式时,特殊的转义字符如\n和\t在列表内也会被识别。请参考转义字符。
英文:
Your problem is with the [^\s]*
-- this doesn't do what you think it does. This will look for any character that doesn't match \
or s
(it doesn't expand \s
to be any whitespace inside of a list -- see below) Because there are no matches for this, it will match all of foobar foobar
, and replace it with bar foobar
. Replace this with [^ ]*
or better yet \S*
From the gnu man page (note other versions of sed may act differently)
> The characters $, , ., [, and \ are normally not special within list.
> For example, [*] matches either ‘\’ or ‘’, because the \ is not
> special here. However, strings like [.ch.], [=a=], and [:space:] are
> special within list and represent collating symbols, equivalence
> classes, and character classes, respectively, and [ is therefore
> special within list when it is followed by ., =, or :. Also, when not
> in POSIXLY_CORRECT mode, special escapes like \n and \t are recognized
> within list. See Escapes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论