Perl正则表达式用于读取方括号。

huangapple go评论68阅读模式
英文:

Perl Regular expression to read square bracket

问题

I would like read bit inside square bracket and also want the square bracket.
The tricky part is class4.
sample1[1] is not a bit. Bit only at the end of line.

Example:

File1.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Expectation result:

class1 bit = [1:2]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

I use regular expression, but square bracket cannot be read.
[] = Used for set of characters.
... = Any character except newline.
ref: https://www.geeksforgeeks.org/perl-regex-cheat-sheet/

My CODE:

my $file = "$File1.txt";
my $line;

open (FILE,"<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    my $line = $_;
    if ($line =~ m/[..]/){
        $line = $&;
    }
}
close (FILE);
英文:

I would like read bit inside square bracket and also want the square bracket.
The tricky part is class4.
sample1[1] is not a bit. Bit only at the end of line.

Example:

File1.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Expectation result:

class1 bit = [1:2]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

I use regular expression, but square bracket cannot be read.
[] = Used for set of characters.
... = Any character except newline.
ref: https://www.geeksforgeeks.org/perl-regex-cheat-sheet/

My CODE:

my $file = "$File1.txt";
my $line;

open (FILE,"<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    my $line = $_;
    if ($line =~ m/[..]/){
        $line = $&;
    }
}
close (FILE);

Result only show:.........

I hope you guys can help me by giving idea. Thanks.

答案1

得分: 4

以下是正则表达式的翻译:

^([^-]*)->.*?(\[[^]]*]);$

这是正则表达式的解释:

^            ## 从值的开头匹配。
(            ## 创建第一个捕获组。
  [^-]*      ## 匹配直到下一个 - 出现之前的所有内容。
)            ## 关闭第一个捕获组。
->           ## 匹配文字 -> 。
.*?          ## 使用懒惰匹配匹配到下一个 [ 的出现。
(            ## 创建第二个捕获组。
  \[[^]]*    ## 匹配文字 [ 后面跟着第一个 ] 的出现。
  \]         ## 匹配文字 ] 。
)            ## 关闭第二个捕获组。
;$           ## 在值的末尾指定文字 ; 。

在线演示已提供。

英文:

With your shown samples please try following regex in PCRE.

^([^-]*)->.*?(\[[^]]*\]);$

Here is the online demo for above regex.

Explanation: Adding detailed explanation for above regex.

^            ##Matching from starting of the value here.
(            ##Creating 1st capturing group here.
  [^-]*      ##Matching everything before very next occurrence of - here.
)            ##Closing capturing group here.
->           ##Matching literal -> here.
.*?          ##Using lazy match to match till next occurrence of [ mentioned below.
(            ##Creating 2nd capturing group here.
  \[[^]]*    ##matching literal [ following by very first occurrence of ] here.
  \]         ##Matching literal ] here.
)            ##Closing 2nd capturing group here.
;$           ##Mentioning literal ; at the end of the value here.

答案2

得分: 4

您可以选择要删除的部分,并用<code> bit = </code> 替换。

^[^-]*\K-&gt;.*(?=\[[^][]*\];$)

解释:

  • ^ 字符串的开头
  • [^-]*\K 匹配除了 - 之外的可选字符,并使用 \K 忘记到目前为止匹配的内容
  • -&gt;.* 匹配 -&gt; 和行的其余部分
  • (?=\[[^][]*\];$) 正向前瞻,断言在行末出现 [...];

另请参阅正则表达式演示Perl演示

示例:

use strict;
use warnings;

while (<DATA>)
{
  s/^[^-]*\K-&gt;.*(?=\[[^][]*\];$)/ bit = /;
  print $_;
}

__DATA__
class1-&gt;Signal = sample1_sample2.sample3_sample4[4:4];
class2-&gt;Signal = sample1.sample2.sample3_sample4_sample5[2];
class3-&gt;Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4-&gt;Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

输出:

class1 bit = [4:4];
class2 bit = [2];
class3 bit = [7:3];
class4 bit = [7:3];

或者稍微具体一点的正则表达式:

^class\d+\K-&gt;.*(?=\[[^][]*\];$)

另请参阅另一个正则表达式演示

英文:

You could select the part that you want to remove, and replace with <code> bit = </code>

^[^-]*\K-&gt;.*(?=\[[^][]*\];$)

Explanation

  • ^ Start of string
  • [^-]*\K Match optional chars other than - and forget what is matches so far using \K
  • -&gt;.* Match -&gt; and the rest of the line
  • (?=\[[^][]*\];$) Positive lookahead, assert [...]; at the end of the line

See a regex demo and a Perl demo

Example

use strict;
use warnings;

while (&lt;DATA&gt;)
{
  s/^[^-]*\K-&gt;.*(?=\[[^][]*\];$)/ bit = /;
  print $_;
}

__DATA__
class1-&gt;Signal = sample1_sample2.sample3_sample4[4:4];
class2-&gt;Signal = sample1.sample2.sample3_sample4_sample5[2];
class3-&gt;Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4-&gt;Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Output

class1 bit = [4:4];
class2 bit = [2];
class3 bit = [7:3];
class4 bit = [7:3];

<hr>

Or a bit more specific regex:

^class\d+\K-&gt;.*(?=\[[^][]*\];$)

See another regex demo.

答案3

得分: 2

[..] 将字符字面值用于匹配括号内的字符,在本例中是句点。

由于您只匹配字面意义上的句点,这就是您所看到的。

这个问题可以通过一个相当简单的正则表达式解决。

因为您只想要最后一个括号,可以依赖于.*的贪婪性来跳过中间的任何括号:

use strict;
use warnings;

my $file = "File1.txt";
my $line;

open (FILE, "<", $file) or die "无法打开文件:$!";
while (<FILE>){
    $line = $_;
    if( $line =~ /(class\d).*(\[[^\]]*\]);/ ){
        $line = "$1 bit = $2";
    }
}
close (FILE);

正则表达式/(class\d).*(\[[^\]]*\]);/ 将匹配class后跟一个数字,然后.*匹配行的其余部分(因此它是贪婪的),并返回足够的内容来匹配(\[[^\]]*\]);

在字符字面值中使用^作为第一个字符会使它匹配除括号内的字符以外的任何内容。

要匹配字面[,必须像\[这样转义它。

(              # 捕获到 $1 
    class\d    # 匹配 "class" 后跟一个数字
)              # 结束捕获
.*             # 匹配任何内容(贪婪)
(              # 捕获到 $2
    \[         # 字面 [
    [^\]]*     # 匹配任何内容,除了 ](贪婪)
    \]         # 字面 ]
)              # 结束捕获
;              # 匹配 ;

括号将保存匹配的内容到变量$1, $2, ... 等。

这也可以使用替代(substitute)来完成,使用相同的正则表达式和/r标志来返回值:

while (<FILE>){
    $line = s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/r;
}

这是一个可以执行相同操作的简单的命令行单行命令:

perl -wlp -e 's/(class\d).*(\[[^\]]*\]);/$1 bit = $2/' File1.txt

在Windows上运行时,请将'更改为"

英文:

[..] makes a character literal for matching the characters within the brackets, period in this case.

Since you are only matching literal periods, this is all you see.

This problem can be solved with a fairly simple regex.

Since you only want the last bracket, you can rely on the greadiness of .* to skip any brackets in the middle:

use strict;
use warnings;

my $file = &quot;File1.txt&quot;; 
my $line;

open (FILE, &quot;&lt;&quot;, $file) or die &quot;Cannot open a file: $!&quot;;
while (&lt;FILE&gt;){
    $line = $_;
    if( $line =~ /(class\d).*(\[[^\]]*\]);/ ){
        $line = &quot;$1 bit = $2&quot;;
    }
}
close (FILE);

the regex /(class\d).*(\[[^\]]*\]);/ will match class followed by a digit, then the .* matches the rest of the line (hence it's greedy) and gives back enough to match (\[[^\]]*\]);

Using ^ as the first character in a character literal makes it match anything EXCEPT the characters within.
To match literal [ you have to escape it like \[.

(              # capture to $1 
    class\d    # match &quot;class&quot; followed by a digit
)              # end capture
.*             # match anything (greedy)
(              # capture to $2
    \[         # literal [
    [^ \] ]*   # match anything, except ] (greedy)
    \]         # literal ]
)              # end capture
;              # match ;

The parentheses will save what is matched within to the variables $1, $2, ... etc.

This can also be done with substitute, using the same regex and the /r flag to return the value:

while (&lt;FILE&gt;){
    $line = s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/r;
}

Here's a simple command line one-liner that'll do the same:

perl -wlp -e &#39;s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/&#39; File1.txt

change &#39; to &quot; to run on windows

答案4

得分: 1

class1 bit = [4:4]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]
英文:
cat /tmp/a.txt
class1-&gt;Signal = sample1_sample2.sample3_sample4[4:4];
class2-&gt;Signal = sample1.sample2.sample3_sample4_sample5[2];
class3-&gt;Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4-&gt;Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

sed -e &#39;s/-&gt;.*\[/ bit = [/g&#39; -e &#39;s/;//g&#39;  /tmp/a.txt
class1 bit = [4:4]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

huangapple
  • 本文由 发表于 2023年5月18日 12:02:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76277648.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定