2023年6月19日 20:04:35go评论75阅读模式

英文:

regular expression escape for grep in bash

问题

要将正则表达式转换为可以在Bash中由grep/sed接受的字符串，你可以使用以下方法：

regex="(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"
echo "$regex"

使用这种方法，你可以将正则表达式保存在一个变量中，然后在需要时使用变量。这样不需要手动转义正则表达式中的特殊字符，而且也可以方便地在grep/sed中使用。

希望这能帮助你在Bash中使用正则表达式。

英文:

What is the best way to convert a regular expression to a string which can be accepted by grep/sed in bash?

for example, given the following regular expression

(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

bash does not like it (and thus this regular expression cannot be used in grep)

$ echo &quot;(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&quot;
-bash: syntax error near unexpected token `(&#39;

$ echo &#39;(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&#39;
&gt;
&gt; ^C

i assume that the regular expression needs to be escaped, but i didn't find any good tool that can do it for me.

any idea how can i let grep use this regular expression in bash?

答案1

得分: 0

让我们结合两个有用的Bash功能来实现这个目标。

首先，您可以完全避免使用Here Doc与带引号的分隔符（即<<"separator"）来转义字符串的需求。例如，您可以这样写：

cat << "EOF"
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF

其次，通过将这个Here Doc封装到一个函数中，您可以轻松地将其存储到一个变量中。从那时起，您可以直接将该变量提供给grep或sed。

例如：

function regex() {
cat << "EOF"
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF
}

echo "email@test.com" | grep -P $( regex )

请注意，您的正则表达式需要一个Perl兼容的正则表达式引擎（即PCRE）。字符类表达式中的转义十六进制序列（即[\x70-\x7f]）不受大多数其他引擎支持，这意味着前面的序列将匹配这些字符：\、x、7、0-\、x、7、f）。

英文:

Let's combine two useful Bash features to get there.

First, you can completely avoid the need to escape a string using a Here Doc with quoted delimiter (ie. <<"separator"). For example, you can write something like this:

cat&lt;&lt;&quot;EOF&quot;
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF

Second, by wrapping that Here Doc into a function, you can easily grab this to a variable. From that point, you can directly provide that variable to grep or sed.

For example:

function regex() {
cat&lt;&lt;&quot;EOF&quot;
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF
}

echo &quot;email@test.com&quot; | grep -P $( regex )

Note that your regex require a Perl-compliant regex engine (aka. PCRE). Escaped hexadecimal sequences inside character class expressions (ie. [\x70-\x7f]) are not supported by most other engines, which means that the previous sequence would match on these characters: \, x, 7, 0-\, x, 7, f).

答案2

得分: 0

你唯一需要知道的就是如何将一个字符串用单引号括起来，如果字符串中包含单引号的话。

让我简化字符串作为一个例子：

O&#39;Reilly

如你所知，反斜杠不能用于转义单引号：

str=&#39;O\&#39;Reilly&#39;         # 错误

相反，你可以这样写：

str=&#39;O&#39;\&#39;&#39;Reilly&#39;

这看起来可能有些奇怪，但它只是将 'O'、\' 和 'Reilly' 连接在一起。

&#39;O&#39;      ... 单引号括起来的字符串 "O"
\&#39;       ... 字面上的单引号
&#39;Reilly&#39; ... 单引号括起来的字符串 "Reilly"

然后你可以将一个变量分配给你的正则表达式：

regex=&#39;(?:[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&#39;

# echo &quot;$regex&quot;

grep -P &quot;$regex&quot; &lt;&lt;&lt; &#39;email@example.com&#39;

请注意，上述示例中的两个单引号是如何处理的。

英文:

The only thing you need to know is how to enclose a string with single
quotes if the string includes single quote(s) inside.
Let me simplify the string as an example:

O&#39;Reilly

As you know, a backslash does not work to escape the single quote within single quotes:

str=&#39;O\&#39;Reilly&#39;         # wrong

Instead you can say:

str=&#39;O&#39;\&#39;&#39;Reilly&#39;

It may look weired but it is just a concatenation of 'O', \' and 'Reilly'.

&#39;O&#39;      ... single quoted string &quot;O&quot;
\&#39;       ... literal single quote
&#39;Reilly&#39; ... single quoted string &quot;Reilly&quot;

Then you can assign a variable to your regex with:

regex=&#39;(?:[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&#39;

# echo &quot;$regex&quot;

grep -P &quot;$regex&quot; &lt;&lt;&lt; &#39;email@example.com&#39;

Please note the two single quotes are handled as the example above.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式在Bash中用于grep时需要进行转义。

问题

答案1

答案2

从字符串中提取一个满足条件的数字。

nginx正则表达式匹配多个可能性并将其连接到一个组中

在mongoose中使用正则表达式过滤部分单词？

创建一个正则表达式，它可以接受数字和 + 作为第一个字符。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论