如何使用Bash正则表达式匹配一个确切的数字字符串?

huangapple go评论45阅读模式
英文:

How to match an exact string of digits with bash regular expression?

问题

我正在尝试使用正则表达式匹配一个精确的数字,但它只匹配第一个数字。

这是我正在处理的文本示例:

0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10

这是我的代码:

regex="^0x[0-9a-z]{8}[[:space:]]+-?[0-9][[:space:]]15610"
[[ $string =~ $regex ]] && echo "${BASH_REMATCH}"

我试图获取第一列中第三列为已知值(在本例中为15610)的值。所以表达式应该匹配0x04c00002 0 15610(我将从中捕获第一个值),但它只匹配0x03600007 0 1(忽略了剩下的4位数)。

为了提供额外的上下文,这个文本是wmctrl -lp命令的输出,第一列是窗口ID,第二列是PID。所以我想要提取的是在已知PID的情况下给出的窗口ID。

我尝试过使用\b(据我所知,这是用于精确的字母单词匹配),括号,引号,没有引号。

英文:

I'm trying to match and exact number with regex but It's only matching the first digit.

Here's a sample of the text I'm working with:

0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10

And here's my code:

regex="^0x[0-9a-z]{8}[[:space:]]+-?[0-9][[:space:]]15610"
[[ $string =~ $regex ]] && echo "${BASH_REMATCH}"

I'm trying to get the value in the first column where the third column is a known value (15610 in this case). So the expression should match 0x04c00002 0 15610 (and I would capture first value from that) but it's only matching 0x03600007 0 1 (the remaining 4 digits are ignored).

For additional context, this text is the output of wmctrl -lp, first column is a window ID, second column is PID. So what I'm looking to extract is the window ID given a known PID.

I've tried things like \b (AFAIK this is for exact alphabetic word matches), brackets, quotes, no quotes.

答案1

得分: 4

使用awk

awk '$3 == 15610{print $1}' file
0x04c00002

使用awk命令可以实现上述功能。该命令的含义是,当第三列等于15610时,打印第一列的内容。在文件file中,满足条件的行的第一列为0x04c00002

英文:

Using awk

awk '$3 == 15610{print $1}' file
0x04c00002

答案2

得分: 4

更新后,根据 OP 的最新更新:

regex="(0x[0-9a-z]{8})[ ]+0[ ]+15610[ ]+"

while read -r line; do echo "###### $line"; [[ "$line" =~ $regex ]] && typeset -p BASH_REMATCH; done < file

###### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
###### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
declare -ar BASH_REMATCH=([0]="0x04c00002  0 15610  " [1]="0x04c00002")
                                                          ^^^^^^^^^^^^^^^^
其中:
- 对于匹配的行,我们可以看到 `BASH_REMATCH[0]` 匹配整个正则表达式,而 ...
- `BASH_REMATCH[1]` 匹配所需的子字符串(即第一列)

或者,如果我们想确保十六进制字符串位于行的开头(并过滤掉行的其余部分):

```bash
regex="^(0x[0-9a-z]{8})[ ]+0[ ]+15610[ ]+.*$"                                                             

while read -r line; do echo "###### $line"; [[ "$line" =~ $regex ]] && typeset -p BASH_REMATCH; done < file

###### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
###### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
declare -ar BASH_REMATCH=([0]="0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10" [1]="0x04c00002")
                                                                                           ^^^^^^^^^^^^^^^^

用列比较替换 BASH_REMATCH 方法:

regex="0x[0-9a-z]{8}"

while read -r line
do 
    echo "####### $line"
    read -r c1 c2 c3 rest_of_line <<< "$line"
    [[ "$c1" =~ $regex && "$c3" = "15610" ]] && echo "$c1"
done < file

####### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
####### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
0x04c00002

# 或者

while read -r c1 c2 c3 rest_of_line
do   
    echo "####### $c1 $c2 $c3 $rest_of_line"
    [[ "$c1" =~ $regex && "$c3" = "15610" ]] && echo "$c1"
done < file

####### 0x03600007 0 1206 km-Inspiron-3593 ~ : bash — Konsole
####### 0x04c00002 0 15610 km-Inspiron-3593 Minecraft 1.7.10
0x04c00002
英文:

Updated after OP's latest update:

regex=&quot;(0x[0-9a-z]{8})[ ]+0[ ]+15610[ ]+&quot;

while read -r line; do echo &quot;###### $line&quot;; [[ &quot;$line&quot; =~ $regex ]] &amp;&amp; typeset -p BASH_REMATCH; done &lt; file

###### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
###### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
declare -ar BASH_REMATCH=([0]=&quot;0x04c00002  0 15610  &quot; [1]=&quot;0x04c00002&quot;)
                                                      ^^^^^^^^^^^^^^^^

Where:

  • for the matching row we see BASH_REMATCH[0] matches the entire regex while ...
  • BASH_REMATCH[1] matches the desired substring (ie, 1st column)

Or if we want to make sure the hex string is at the start of the line (and filter out the rest of the line):

regex=&quot;^(0x[0-9a-z]{8})[ ]+0[ ]+15610[ ]+.*$&quot;                                                             

while read -r line; do echo &quot;###### $line&quot;; [[ &quot;$line&quot; =~ $regex ]] &amp;&amp; typeset -p BASH_REMATCH; done &lt; file

###### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
###### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
declare -ar BASH_REMATCH=([0]=&quot;0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10&quot; [1]=&quot;0x04c00002&quot;)
                                                                                       ^^^^^^^^^^^^^^^^

Replacing the BASH_REMATCH approach with a comparison of columns:

regex=&quot;0x[0-9a-z]{8}&quot;

while read -r line
do 
    echo &quot;####### $line&quot;
    read -r c1 c2 c3 rest_of_line &lt;&lt;&lt; &quot;$line&quot;
    [[ &quot;$c1&quot; =~ $regex &amp;&amp; &quot;$c3&quot; = &quot;15610&quot; ]] &amp;&amp; echo &quot;$c1&quot;
done &lt; file

####### 0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole
####### 0x04c00002  0 15610  km-Inspiron-3593 Minecraft 1.7.10
0x04c00002

# or

while read -r c1 c2 c3 rest_of_line
do   
    echo &quot;####### $c1 $c2 $c3 $rest_of_line&quot;
    [[ &quot;$c1&quot; =~ $regex &amp;&amp; &quot;$c3&quot; = &quot;15610&quot; ]] &amp;&amp; echo &quot;$c1&quot;
done &lt; file

####### 0x03600007 0 1206 km-Inspiron-3593 ~ : bash — Konsole
####### 0x04c00002 0 15610 km-Inspiron-3593 Minecraft 1.7.10
0x04c00002

答案3

得分: 0

我认为你错过了一个重复模式。使用第一个示例:

0x03600007 0 1206 km-Inspiron-3593 ~ : bash — Konsole

使用提供的正则表达式:

原始正则表达式:^0x[0-9a-z]{8}[[:space:]]+-?[0-9][[:space:]]

第一步:^0x[0-9a-z]{8}[[:space:]]+ -> 0x03600007

第二步:-? -> 匹配为空

第三步:[0-9][[:space:]] -> 0

我猜多出来的数字来自于正则表达式的末尾,即字面量15610

如果你想要完整的数字,这个正则表达式可能更适合:

^0x[0-9a-z]{8}[[:space:]0-9]+[[:space:]]

注意到了[[:space:]0-9]+这一部分。如果你只想要第二个数字,你可能只需要将[0-9]作为唯一的重复部分。我建议的正则表达式将允许你继续捕获剩余的文本。

regex="^0x[0-9a-z]{8}[[:space:]0-9]+[[:space:]]"
[[ "0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole" =~ $regex ]] && echo "${BASH_REMATCH}"
英文:

I think you've missed a repetition pattern. Using the first example:

0x03600007 0 1206 km-Inspiron-3593 ~ : bash — Konsole

with the provided regex:

original: `^0x[0-9a-z]{8}[[:space:]]+-?[0-9][[:space:]]`

first step: `^0x[0-9a-z]{8}[[:space:]]+` -&gt; `0x03600007  `

second step: `-?` -&gt; matches empty

third step: `[0-9][[:space:]]` -&gt; `0 `

I'm guessing the extra digit came from the end of the regex, namely, the literal 15610.

If you want the full number, this regex might be better suited:

^0x[0-9a-z]{8}[[:space:]0-9]+[[:space:]]

Note the section [[:space:]0-9]+. If you only want the second number, you can probably make [0-9] be the only repetition. The regex I'm suggesting would allow you to continue your capture of the remaining of the text

regex=&quot;^0x[0-9a-z]{8}[[:space:]0-9]+[[:space:]]&quot;
[[ &quot;0x03600007  0 1206   km-Inspiron-3593 ~ : bash — Konsole&quot; =~ $regex ]] &amp;&amp; echo &quot;${BASH_REMATCH}&quot;

huangapple
  • 本文由 发表于 2023年8月9日 05:36:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863343.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定