英文:
sed giving error when repetition is used in powershell
问题
I was writing a powershell script and I need to use sed to extract some part of the output of a different command. It looks something like this:
echo "d6121090" | sed -E "s/^d6(.*)10.*/\/"
12
But if I replace .*
in sed with .{0,}
or with {0,2}
sed is giving me error:
echo "d6121090" | sed -E "s/^d6(.{0,})10.*/\/"
/usr/bin/sed: can't read s/^d6(.)10.*//: No such file or directory
echo "d6121090" | sed -E "s/^d6(.{0,2})10.*/\/"
/usr/bin/sed: can't read s/^d6(.2)10.*//: No such file or directory
I'm not sure if the error has something to do with sed or powershell.
I'm using sed provided by Cygwin (4.9-1)
, version is (GNU sed) 4.9
.
If there is a better way to extract a part of the string, then please mention that also.
英文:
I was writing a powershell script and I need to use sed to extract some part of the output of a different command. It looks something like this:
echo "d6121090" | sed -E "s/^d6(.*)10.*/\/"
12
But if I replace .*
in sed with .{0,}
or with {0,2}
sed is giving me error:
echo "d6121090" | sed -E "s/^d6(.{0,})10.*/\/"
/usr/bin/sed: can't read s/^d6(.)10.*//: No such file or directory
echo "d6121090" | sed -E "s/^d6(.{0,2})10.*/\/"
/usr/bin/sed: can't read s/^d6(.2)10.*//: No such file or directory
I'm not sure if the error has something to do with sed or powershell.
I'm using sed provided by Cygwin (4.9-1)
, version is (GNU sed) 4.9
.
If there is a better way to extract a part of string then please mention that also.
答案1
得分: 3
以下是翻译好的部分:
**tl;dr**
**Workarounds**,按优先级降序排列(请注意,`echo "d6121090"` 已简化为只是 `'d6121090''`,使用 PowerShell 的 _隐式输出行为_ - 参见 [此答案](https://stackoverflow.com/a/69792182/45375)):
* 使用 **`'...'` 嵌套在 `"..."` 中引用**:
'd6121090' | sed -E "'s/^d6(.{0,2})10.*//'"
* 注意:这利用了一个异常情况,即 Cygwin 提供的可执行文件也识别在其命令行上使用 `'...''` 引用。
* 通过在 `sed` 脚本的末尾添加一个额外的空格来强制 PowerShell *引用* 过程命令行的脚本(请参阅下一部分)。
'd6121090' | sed -E 's/^d6(.{0,2})10.*// ' # <- 请注意末尾的空格
* 通过 `cmd.exe` 调用,这允许您*显式*控制引号(`"` 周围的空格仅用于可读性)。
'd6121090' | cmd /c " sed -E 's/^d6(.{0,2})10.*//' "
* 使用 `--%`,[停止解析令牌](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Parsing#the-stop-parsing-token) 传递命令行的其余部分 _原样_,但请注意其_基本限制_,在 [此答案](http://stackoverflow.com/a/42601912/45375) 的底部部分有讨论。
'd6121090' | sed -E --% 's/^d6(.{0,2})10.*//'
---
#### 背景信息:
您正在看到**两种令人惊讶的行为的交汇处**:
* 当您从 PowerShell 这样的 Windows Shell 调用 **Cygwin** 实用程序(如 `sed`)时,命令行被解释为如果它是从诸如 Bash 这样的与 POSIX 兼容的 Shell 提交的。
* 好的一面是,这允许您在从 `cmd.exe` 调用时使用 `'...''` 引用,尽管 Windows 上的 CLI 通常仅了解 `"..."` 引用。
* **PowerShell** - 必需的(在 Windows 上) - 在幕后构建它用于启动子进程的命令行,因为其自己的命令行语法 - 特别是使用 `'...'` 字符串的能力 - 不能指望外部 CLI(在 Windows 上预计仅识别 `"..."` 字符串)。
* 但是,当 PowerShell 在幕后重建命令行时,它仅基于参数是否包含空格的情况下,使用 _按需_ 的双引号引用。因此,原始命令行上的原文 `"s/^d6(.{0,2})10.*/\/"` 被放置为原文 `s/^d6(.{0,2})10.*/\/` - 没有引用!- 在进程命令行上。
* 通常情况下,这通常不是问题,因为大多数 CLI 使用它们的参数_原文_,而不是将它们提交给类似 Shell 的解释。
* 但是,在直接的 `cmd /c` 调用以及间接地在调用批处理文件时,这也是 `cmd.exe` 的问题,[GitHub 问题 #15143](https://github.com/PowerShell/PowerShell/issues/15143) 提出了修改 PowerShell 以适应这一怪癖的建议,即使没有括号()的转义,空格边缘的 `cmd.exe` 元字符也会引起这个问题,但看起来不会实施这些改进。
因此,PowerShell 实际提交的命令行如下所示 - 请注意缺少引号:
```powershell
sed -E s/^d6(.{0,2})10.*/\/
在 sed
脚本周围缺少引号会导致 Cygwin 将 {
和 }
视为 Bash 花括号扩展 表达式,因此会展开为_多个_参数,而额外的参数则被解释为 - 不存在 - 文件名。
您可以通过使用 Cygwin 的 printf.exe
验证这一点:
# 从 PowerShell
printf '%s\n' 's/^d6(.{0,2})10.*//'
输出:
s/^d6(.0)10.*/1/
s/^d6(.2)10.*/1/
也就是说,花括号扩展将 .{0,2}
转换为 两个 参数,.0
和 .2
,以及前缀和后缀字符串,并且第二个参数然后被解释为文件名。
有几种解决方法,如上所示,但在这种情况下最简单的一种是_在您的 sed
脚本后面添加一个空格_,这不会干扰脚本的功能,但会迫使 PowerShell 在
英文:
<!-- language-all: sh -->
tl;dr
Workarounds, in descending order of preference (note that echo "d6121090"
was simplified to just 'd6121090'
, using PowerShell's implicit output behavior - see this answer):
-
Use
'...'
quoting embedded inside"..."
:'d6121090' | sed -E "'s/^d6(.{0,2})10.*//'"
- Note: This takes advantage of the fact that - unusually - the Cygwin-provided executables also recognize
'...'
quoting on their command lines. - Using
'"..."'
instead (quotes swapped) - surprisingly - works as-is up to PowerShell v7.2.x; it should never have worked, but did due to a long-standing bug with respect to passing arguments with embedded"
chars. to external programs. This was (mostly) fixed in v7.3, where you can opt-into the old, broken behavior with$PSNativeCommandArgumentPassing = 'Legacy'
; see this answer for details.
- Note: This takes advantage of the fact that - unusually - the Cygwin-provided executables also recognize
-
Use an extra space to force PowerShell to quote the
sed
script on the process command line built behind the scenes (see next section).'d6121090' | sed -E 's/^d6(.{0,2})10.*// ' # <- Note the trailing space
-
Call via
cmd.exe
, which allows you to control quoting explicitly (the spaces around"
are just for readability).'d6121090' | cmd /c " sed -E 's/^d6(.{0,2})10.*//' "
-
Use
--%
, the stop-parsing token to pass the remainder of the command line as-is, but note its fundamental limitations, discussed in the bottom section of this answer.'d6121090' | sed -E --% 's/^d6(.{0,2})10.*//'
Background information:
You're seeing the confluence of two surprising behaviors:
-
When you invoke a Cygwin utility such as
sed
from a Windows shell such as PowerShell, the command line is interpreted as if it had been submitted from a POSIX-compatible shell such as Bash.- On the plus side, this allows you to use
'...'
quoting even when calling fromcmd.exe
, even though CLIs on Windows normally only understand"..."
quoting.
- On the plus side, this allows you to use
-
PowerShell - of necessity (on Windows) - rebuilds the command line it uses to launch child processes behind the scenes, because its own command-line syntax - notably the ability to use
'...'
strings - cannot be expected to be understood by outside CLIs (which on Windows are expected to recognize"..."
strings only).-
However, PowerShell employs on-demand double-quoting when it rebuilds the command line, based solely on whether an argument contains spaces. Therefore, what was verbatim
"s/^d6(.{0,2})10.*/\\1/"
on the original command line is placed as verbatims/^d6(.{0,2})10.*/\\1/
- without quoting! - on the process command line. -
This is normally not a problem, given that most CLIs use their arguments verbatim rather than subjecting them to shell-like interpretation.
- It is, however, also a problem with
cmd.exe
, both in directcmd /c
calls as well indirectly, when calling batch files; GitHub issue #15143, among other suggested improvements, proposed modifying PowerShell to accommodate this quirk and double-quote even space-less arguments if they containcmd.exe
metacharacters, but it looks like such improvements won't be implemented.
- It is, however, also a problem with
-
Therefore, the command line that PowerShell actually submits is the following - note the absence of quoting:
sed -E s/^d6(.{0,2})10.*/\/
The lack of quoting around the sed
script causes Cygwin to treat {
and }
as Bash brace expansion expression, which therefore expands to multiple arguments, with the extra argument getting interpreted as a - non-existent - filename.
You can verify this as follows, using Cygwin's printf.exe
:
# From PowerShell
printf '%s\n' 's/^d6(.{0,2})10.*//'
Output:
s/^d6(.0)10.*/1/
s/^d6(.2)10.*/1/
That is, the brace expansion turned .{0, 2}
into two arguments, .0
and .2
, along with the prefix and suffix string, and the second argument was then interpreted as a filename.
There are several workarounds, as shown above, but the simplest one in this case is to append a space to your sed
script, which doesn't interfere with the script's function, but forces PowerShell to enclose the script in "..."
behind the scenes (in cases where adding an extra space would interfere with the intended functionality, such as when passing a space-less search pattern to grep
, use the "'...'"
technique shown at the top):
'd6121090' | sed -E 's/^d6(.{0,2})10.*// ' # <- Note the trailing space
Note:
-
'...'
quoting is used on the PowerShell side, which is generally preferable when you're dealing with verbatim (literal) values. -
The
\
before\1
is not escaped; to PowerShell itself,\
is never special.- See the bottom section for more information about PowerShell string literals.
-
Even though the
'...'
quoting gets translated to"..."
quoting behind the scenes,\
also does not require escaping on the Cygwin side (though doing so would also work). This, along with the fact that an unquoted argument such ass/^d6(.{0,2})10.*/\1/
would cause a syntax error on the Bash command line - due to the unescaped(
and)
- suggests that Cygwin employs some kind of hybrid approach to parsing the command line (presumably built into each and every.exe
that Cygwin comes with).
> if there is a better way to extract a part of string then please mention that also.
PowerShell has great regex support built in, and its -replace
operator allows you to do what sed
's s///
function does, only more efficiently, because the operation is performed in-process:
# From PowerShell
'd6121090' -replace '^d6(.{0,2})10.*', '$1' # -> '12'
PowerShell's string literals and escaping:
-
In PowerShell's expandable (double-quoted) strings (
"..."
),\
has no special meaning (and neither do{
and}
).-
PowerShell's escape characters is
`
, the so-called backtick, and inside"..."
only it and$
have special meaning (the latter for referencing variables and subexpressions to be expanded (interpolated)) and therefore need escaping with`
if meant to be used verbatim. -
Similar to POSIX-compatible shells such as Bash, PowerShell also has verbatim (single-quoted) strings (
'...'
)'...'
is generally preferable when expressing regexes or substitution expressions.
-
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论