正则表达式在Bash中用于grep时需要进行转义。

huangapple go评论66阅读模式
英文:

regular expression escape for grep in bash

问题

要将正则表达式转换为可以在Bash中由grep/sed接受的字符串,你可以使用以下方法:

regex="(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"
echo "$regex"

使用这种方法,你可以将正则表达式保存在一个变量中,然后在需要时使用变量。这样不需要手动转义正则表达式中的特殊字符,而且也可以方便地在grep/sed中使用。

希望这能帮助你在Bash中使用正则表达式。

英文:

What is the best way to convert a regular expression to a string which can be accepted by grep/sed in bash?

for example, given the following regular expression

(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

bash does not like it (and thus this regular expression cannot be used in grep)

$ echo "(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"
-bash: syntax error near unexpected token `('

$ echo '(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'
>
> ^C

i assume that the regular expression needs to be escaped, but i didn't find any good tool that can do it for me.

any idea how can i let grep use this regular expression in bash?

答案1

得分: 0

让我们结合两个有用的Bash功能来实现这个目标。

首先,您可以完全避免使用Here Doc与带引号的分隔符(即<<"separator")来转义字符串的需求。例如,您可以这样写:

cat << "EOF"
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF

其次,通过将这个Here Doc封装到一个函数中,您可以轻松地将其存储到一个变量中。从那时起,您可以直接将该变量提供给grep或sed。

例如:

function regex() {
cat << "EOF"
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF
}

echo "email@test.com" | grep -P $( regex )

请注意,您的正则表达式需要一个Perl兼容的正则表达式引擎(即PCRE)。字符类表达式中的转义十六进制序列(即[\x70-\x7f])不受大多数其他引擎支持,这意味着前面的序列将匹配这些字符:\x70-\x7f)。

英文:

Let's combine two useful Bash features to get there.

First, you can completely avoid the need to escape a string using a Here Doc with quoted delimiter (ie. &lt;&lt;&quot;separator&quot;). For example, you can write something like this:

cat&lt;&lt;&quot;EOF&quot;
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF

Second, by wrapping that Here Doc into a function, you can easily grab this to a variable. From that point, you can directly provide that variable to grep or sed.

For example:

function regex() {
cat&lt;&lt;&quot;EOF&quot;
(?:[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
EOF
}

echo &quot;email@test.com&quot; | grep -P $( regex )

Note that your regex require a Perl-compliant regex engine (aka. PCRE). Escaped hexadecimal sequences inside character class expressions (ie. [\x70-\x7f]) are not supported by most other engines, which means that the previous sequence would match on these characters: \, x, 7, 0-\, x, 7, f).

答案2

得分: 0

你唯一需要知道的就是如何将一个字符串用单引号括起来,如果字符串中包含单引号的话。

让我简化字符串作为一个例子:

O&#39;Reilly

如你所知,反斜杠不能用于转义单引号:

str=&#39;O\&#39;Reilly&#39;         # 错误

相反,你可以这样写:

str=&#39;O&#39;\&#39;&#39;Reilly&#39;

这看起来可能有些奇怪,但它只是将 &#39;O&#39;\&#39;&#39;Reilly&#39; 连接在一起。

&#39;O&#39;      ... 单引号括起来的字符串 "O"
\&#39;       ... 字面上的单引号
&#39;Reilly&#39; ... 单引号括起来的字符串 "Reilly"

然后你可以将一个变量分配给你的正则表达式:

regex=&#39;(?:[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&#39;

# echo &quot;$regex&quot;

grep -P &quot;$regex&quot; &lt;&lt;&lt; &#39;email@example.com&#39;

请注意,上述示例中的两个单引号是如何处理的。

英文:

The only thing you need to know is how to enclose a string with single
quotes if the string includes single quote(s) inside.
Let me simplify the string as an example:

O&#39;Reilly

As you know, a backslash does not work to escape the single quote within single quotes:

str=&#39;O\&#39;Reilly&#39;         # wrong

Instead you can say:

str=&#39;O&#39;\&#39;&#39;Reilly&#39;

It may look weired but it is just a concatenation of &#39;O&#39;, \&#39; and &#39;Reilly&#39;.

&#39;O&#39;      ... single quoted string &quot;O&quot;
\&#39;       ... literal single quote
&#39;Reilly&#39; ... single quoted string &quot;Reilly&quot;

Then you can assign a variable to your regex with:

regex=&#39;(?:[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;&#39;\&#39;&#39;*+\/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])&#39;

# echo &quot;$regex&quot;

grep -P &quot;$regex&quot; &lt;&lt;&lt; &#39;email@example.com&#39;

Please note the two single quotes are handled as the example above.

huangapple
  • 本文由 发表于 2023年6月19日 20:04:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506454.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定