sed在PowerShell中重复使用时出错。

huangapple go评论74阅读模式
英文:

sed giving error when repetition is used in powershell

问题

I was writing a powershell script and I need to use sed to extract some part of the output of a different command. It looks something like this:

echo "d6121090" | sed -E "s/^d6(.*)10.*/\/"
12

But if I replace .* in sed with .{0,} or with {0,2} sed is giving me error:

echo "d6121090" | sed -E "s/^d6(.{0,})10.*/\/"
/usr/bin/sed: can't read s/^d6(.)10.*//: No such file or directory
echo "d6121090" | sed -E "s/^d6(.{0,2})10.*/\/"
/usr/bin/sed: can't read s/^d6(.2)10.*//: No such file or directory

I'm not sure if the error has something to do with sed or powershell.

I'm using sed provided by Cygwin (4.9-1), version is (GNU sed) 4.9.

If there is a better way to extract a part of the string, then please mention that also.

英文:

I was writing a powershell script and I need to use sed to extract some part of the output of a different command. It looks something like this:

echo "d6121090" | sed -E "s/^d6(.*)10.*/\/"
12

But if I replace .* in sed with .{0,} or with {0,2} sed is giving me error:

echo "d6121090" | sed -E "s/^d6(.{0,})10.*/\/"
/usr/bin/sed: can't read s/^d6(.)10.*//: No such file or directory
echo "d6121090" | sed -E "s/^d6(.{0,2})10.*/\/"
/usr/bin/sed: can't read s/^d6(.2)10.*//: No such file or directory

I'm not sure if the error has something to do with sed or powershell.

I'm using sed provided by Cygwin (4.9-1), version is (GNU sed) 4.9.

If there is a better way to extract a part of string then please mention that also.

答案1

得分: 3

以下是翻译好的部分:

**tl;dr**

**Workarounds**,按优先级降序排列(请注意,`echo "d6121090"` 已简化为只是 `'d6121090''`,使用 PowerShell 的 _隐式输出行为_ - 参见 [此答案](https://stackoverflow.com/a/69792182/45375)):

* 使用 **`'...'` 嵌套在 `"..."` 中引用**:

      'd6121090' | sed -E "'s/^d6(.{0,2})10.*//'"

   * 注意:这利用了一个异常情况,即 Cygwin 提供的可执行文件也识别在其命令行上使用 `'...''` 引用。

* 通过在 `sed` 脚本的末尾添加一个额外的空格来强制 PowerShell *引用* 过程命令行的脚本(请参阅下一部分)。

      'd6121090' | sed -E 's/^d6(.{0,2})10.*// ' # <- 请注意末尾的空格

* 通过 `cmd.exe` 调用,这允许您*显式*控制引号(`"` 周围的空格仅用于可读性)。

      'd6121090' | cmd /c " sed -E 's/^d6(.{0,2})10.*//' "


* 使用 `--%`,[停止解析令牌](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Parsing#the-stop-parsing-token) 传递命令行的其余部分 _原样_,但请注意其_基本限制_,在 [此答案](http://stackoverflow.com/a/42601912/45375) 的底部部分有讨论。

      'd6121090' | sed -E --% 's/^d6(.{0,2})10.*//'


---


#### 背景信息:

您正在看到**两种令人惊讶的行为的交汇处**:

* 当您从 PowerShell 这样的 Windows Shell 调用 **Cygwin** 实用程序(如 `sed`)时,命令行被解释为如果它是从诸如 Bash 这样的与 POSIX 兼容的 Shell 提交的。

  * 好的一面是,这允许您在从 `cmd.exe` 调用时使用 `'...''` 引用,尽管 Windows 上的 CLI 通常仅了解 `"..."` 引用。

* **PowerShell** - 必需的(在 Windows 上) - 在幕后构建它用于启动子进程的命令行,因为其自己的命令行语法 - 特别是使用 `'...'` 字符串的能力 - 不能指望外部 CLI(在 Windows 上预计仅识别 `"..."` 字符串)。

  * 但是,当 PowerShell 在幕后重建命令行时,它仅基于参数是否包含空格的情况下,使用 _按需_ 的双引号引用。因此,原始命令行上的原文 `"s/^d6(.{0,2})10.*/\/"` 被放置为原文 `s/^d6(.{0,2})10.*/\/` - 没有引用!- 在进程命令行上。

  * 通常情况下,这通常不是问题,因为大多数 CLI 使用它们的参数_原文_,而不是将它们提交给类似 Shell 的解释。

    * 但是,在直接的 `cmd /c` 调用以及间接地在调用批处理文件时,这也是 `cmd.exe` 的问题,[GitHub 问题 #15143](https://github.com/PowerShell/PowerShell/issues/15143) 提出了修改 PowerShell 以适应这一怪癖的建议,即使没有括号()的转义,空格边缘的 `cmd.exe` 元字符也会引起这个问题,但看起来不会实施这些改进。


因此,PowerShell 实际提交的命令行如下所示 - 请注意缺少引号:

```powershell
sed -E s/^d6(.{0,2})10.*/\/

sed 脚本周围缺少引号会导致 Cygwin 将 {} 视为 Bash 花括号扩展 表达式,因此会展开为_多个_参数,而额外的参数则被解释为 - 不存在 - 文件名

您可以通过使用 Cygwin 的 printf.exe 验证这一点:

# 从 PowerShell
printf '%s\n' 's/^d6(.{0,2})10.*//'

输出:

s/^d6(.0)10.*/1/
s/^d6(.2)10.*/1/

也就是说,花括号扩展将 .{0,2} 转换为 两个 参数,.0.2,以及前缀和后缀字符串,并且第二个参数然后被解释为文件名。


有几种解决方法,如上所示,但在这种情况下最简单的一种是_在您的 sed 脚本后面添加一个空格_,这不会干扰脚本的功能,但会迫使 PowerShell 在

英文:

<!-- language-all: sh -->

tl;dr

Workarounds, in descending order of preference (note that echo &quot;d6121090&quot; was simplified to just &#39;d6121090&#39;, using PowerShell's implicit output behavior - see this answer):

  • Use &#39;...&#39; quoting embedded inside &quot;...&quot;:

    &#39;d6121090&#39; | sed -E &quot;&#39;s/^d6(.{0,2})10.*//&#39;&quot;
    
    • Note: This takes advantage of the fact that - unusually - the Cygwin-provided executables also recognize &#39;...&#39; quoting on their command lines.
    • Using &#39;&quot;...&quot;&#39; instead (quotes swapped) - surprisingly - works as-is up to PowerShell v7.2.x; it should never have worked, but did due to a long-standing bug with respect to passing arguments with embedded &quot; chars. to external programs. This was (mostly) fixed in v7.3, where you can opt-into the old, broken behavior with $PSNativeCommandArgumentPassing = &#39;Legacy&#39;; see this answer for details.
  • Use an extra space to force PowerShell to quote the sed script on the process command line built behind the scenes (see next section).

    &#39;d6121090&#39; | sed -E &#39;s/^d6(.{0,2})10.*// &#39; # &lt;- Note the trailing space
    
  • Call via cmd.exe, which allows you to control quoting explicitly (the spaces around &quot; are just for readability).

    &#39;d6121090&#39; | cmd /c &quot; sed -E &#39;s/^d6(.{0,2})10.*//&#39; &quot;
    
  • Use --%, the stop-parsing token to pass the remainder of the command line as-is, but note its fundamental limitations, discussed in the bottom section of this answer.

    &#39;d6121090&#39; | sed -E --% &#39;s/^d6(.{0,2})10.*//&#39;
    

Background information:

You're seeing the confluence of two surprising behaviors:

  • When you invoke a Cygwin utility such as sed from a Windows shell such as PowerShell, the command line is interpreted as if it had been submitted from a POSIX-compatible shell such as Bash.

    • On the plus side, this allows you to use &#39;...&#39; quoting even when calling from cmd.exe, even though CLIs on Windows normally only understand &quot;...&quot; quoting.
  • PowerShell - of necessity (on Windows) - rebuilds the command line it uses to launch child processes behind the scenes, because its own command-line syntax - notably the ability to use &#39;...&#39; strings - cannot be expected to be understood by outside CLIs (which on Windows are expected to recognize &quot;...&quot; strings only).

    • However, PowerShell employs on-demand double-quoting when it rebuilds the command line, based solely on whether an argument contains spaces. Therefore, what was verbatim &quot;s/^d6(.{0,2})10.*/\\1/&quot; on the original command line is placed as verbatim s/^d6(.{0,2})10.*/\\1/ - without quoting! - on the process command line.

    • This is normally not a problem, given that most CLIs use their arguments verbatim rather than subjecting them to shell-like interpretation.

      • It is, however, also a problem with cmd.exe, both in direct cmd /c calls as well indirectly, when calling batch files; GitHub issue #15143, among other suggested improvements, proposed modifying PowerShell to accommodate this quirk and double-quote even space-less arguments if they contain cmd.exe metacharacters, but it looks like such improvements won't be implemented.

Therefore, the command line that PowerShell actually submits is the following - note the absence of quoting:

sed -E s/^d6(.{0,2})10.*/\/

The lack of quoting around the sed script causes Cygwin to treat { and } as Bash brace expansion expression, which therefore expands to multiple arguments, with the extra argument getting interpreted as a - non-existent - filename.

You can verify this as follows, using Cygwin's printf.exe:

# From PowerShell
printf &#39;%s\n&#39; &#39;s/^d6(.{0,2})10.*//&#39;

Output:

s/^d6(.0)10.*/1/
s/^d6(.2)10.*/1/

That is, the brace expansion turned .{0, 2} into two arguments, .0 and .2, along with the prefix and suffix string, and the second argument was then interpreted as a filename.


There are several workarounds, as shown above, but the simplest one in this case is to append a space to your sed script, which doesn't interfere with the script's function, but forces PowerShell to enclose the script in &quot;...&quot; behind the scenes (in cases where adding an extra space would interfere with the intended functionality, such as when passing a space-less search pattern to grep, use the &quot;&#39;...&#39;&quot; technique shown at the top):

&#39;d6121090&#39; | sed -E &#39;s/^d6(.{0,2})10.*// &#39; # &lt;- Note the trailing space

Note:

  • &#39;...&#39; quoting is used on the PowerShell side, which is generally preferable when you're dealing with verbatim (literal) values.

  • The \ before \1 is not escaped; to PowerShell itself, \ is never special.

    • See the bottom section for more information about PowerShell string literals.
  • Even though the &#39;...&#39; quoting gets translated to &quot;...&quot; quoting behind the scenes, \ also does not require escaping on the Cygwin side (though doing so would also work). This, along with the fact that an unquoted argument such as s/^d6(.{0,2})10.*/\1/ would cause a syntax error on the Bash command line - due to the unescaped ( and ) - suggests that Cygwin employs some kind of hybrid approach to parsing the command line (presumably built into each and every .exe that Cygwin comes with).


> if there is a better way to extract a part of string then please mention that also.

PowerShell has great regex support built in, and its -replace operator allows you to do what sed's s/// function does, only more efficiently, because the operation is performed in-process:

# From PowerShell
&#39;d6121090&#39; -replace &#39;^d6(.{0,2})10.*&#39;, &#39;$1&#39; # -&gt; &#39;12&#39;

PowerShell's string literals and escaping:

  • In PowerShell's expandable (double-quoted) strings (&quot;...&quot;), \ has no special meaning (and neither do { and }).

    • PowerShell's escape characters is `, the so-called backtick, and inside &quot;...&quot; only it and $ have special meaning (the latter for referencing variables and subexpressions to be expanded (interpolated)) and therefore need escaping with ` if meant to be used verbatim.

    • Similar to POSIX-compatible shells such as Bash, PowerShell also has verbatim (single-quoted) strings (&#39;...&#39;) &#39;...&#39; is generally preferable when expressing regexes or substitution expressions.

huangapple
  • 本文由 发表于 2023年6月21日 22:52:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76524615.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定