英文:
Why this regex pattern fails to match a given string and how to correct it?
问题
I want to capture all characters using python regex which satisfy one of the three conditions described below.
(~
means zero or more characters)
[pattern1] NAME_
”words_or_numbers” AGE_
my_num
~;
[pattern2] NAME_
”words_or_numbers” DESC_
my_num
~;
[pattern3] NAME_ADD_
”words_or_numbers” CHAR_DESC_ADD_
word_or_numbers_or_underscore DESC_ my_num
~;
For [pattern1], [pattern2], [pattern3], I’d like to find only the text that matches the given my_num
. For example, the example below indicates that I picked 373 and 416 as the my_num
values.
(Note that each pattern can contain multiline characters)
Original Text:
NAME_ "Hello" AGE_ 373 0;
NAME_ "Summer" AGE_ 340 0;
NAME_ "Sam" AGE_ 416 14;
NAME_ "Edward" DESC_ 373 ABC_DEF_G "These are users.
age, description
- example(0x15) , Isfalse : 0xF+df
- safe.
- (t) = + 1";
NAME_ "Alex" DESC_ 373 asdf 65535;
NAME_ADD_ "Crystal" CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ "Ray" DESC_ 111 asdfs 3;
NAME_ "Brown" DESC_ 416 asdfs 3;
NAME_ADD_ "Hailey" CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ "Watson" AGE_ 373 0;
NOT_NAME_ 324 XYZ 22 "A" 1 "B" 2 "C" 3 "R" ;
Desired Output:
NAME_ "Hello" AGE_ 373 0;
NAME_ "Sam" AGE_ 416 14;
NAME_ "Edward" DESC_ 373 ABC_DEF_G "These are users.
age, description
- example(0x15) , Isfalse : 0xF+df
- safe.
- (t) = + 1";
NAME_ "Alex" DESC_ 373 asdf 65535;
NAME_ADD_ "Crystal" CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ "Brown" DESC_ 416 asdfs 3;
NAME_ADD_ "Hailey" CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ "Watson" AGE_ 373 0;
I’ve tried using regex like (with re.findall method):
(?s)((NAME_ .+ (AGE_|DESC_) (373|416) .?(?=NAME_|NOT_NAME_|$))|(NAME_ADD_ .+ CHAR_DESC_ADD_ .+ DESC_ (373|416) .?(?=NAME_|NOT_NAME_|$)))
but it captured nothing. What's wrong with my attempt, and how can this be done properly?
英文:
I want to capture all characters using python regex which satisfy one of the three conditions described below.
(~
means zero or more characters)
[pattern1] NAME_
”words_or_numbers” AGE_
my_num
~;
[pattern2] NAME_
”words_or_numbers” DESC_
my_num
~;
[pattern3] NAME_ADD_
”words_or_numbers” CHAR_DESC_ADD_
word_or_numbers_or_underscore DESC_ my_num
~;
For [pattern1], [pattern2], [pattern3], I’d like to find only the text that matches the given my_num
. For example, the example below indicates that I picked 373 and 416 as the my_num
values.
(Note that each pattern can contain multiline characters)
Original Text:
NAME_ "Hello" AGE_ 373 0;
NAME_ "Summer" AGE_ 340 0;
NAME_ "Sam" AGE_ 416 14;
NAME_ "Edward" DESC_ 373 ABC_DEF_G "These are users.
age, description
- example(0x15) , Isfalse : 0xF+df
- safe.
- (t) = + 1";
NAME_ "Alex" DESC_ 373 asdf 65535;
NAME_ADD_ "Crystal" CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ "Ray" DESC_ 111 asdfs 3;
NAME_ "Brown" DESC_ 416 asdfs 3;
NAME_ADD_ "Hailey" CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ "Watson" AGE_ 373 0;
NOT_NAME_ 324 XYZ 22 "A" 1 "B" 2 "C" 3 "R" ;
Desired Output:
NAME_ "Hello" AGE_ 373 0;
NAME_ "Sam" AGE_ 416 14;
NAME_ "Edward" DESC_ 373 ABC_DEF_G "These are users.
age, description
- example(0x15) , Isfalse : 0xF+df
- safe.
- (t) = + 1";
NAME_ "Alex" DESC_ 373 asdf 65535;
NAME_ADD_ "Crystal" CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ "Brown" DESC_ 416 asdfs 3;
NAME_ADD_ "Hailey" CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ "Watson" AGE_ 373 0;
I’ve tried using regex like (with re.findall method):
(?s)((NAME_ .+ (AGE_|DESC_) (373|416) .?(?=NAME_|NOT_NAME_|$))|(NAME_ADD_ .+ CHAR_DESC_ADD_ .+ DESC_ (373|416) .?(?=NAME_|NOT_NAME_|$)))
but it captured nothing. What's wrong with my attempt, and how can this be done properly?
答案1
得分: 2
正则表达式的主要问题在于,您只匹配了my_num
后面的空格和单个可选字符。在您的原始文本中,没有与此匹配的序列,因此结果为空。另外,.+
应该被改成排除;
字符,否则正则表达式可以匹配整个文件,只要前后几个字符连在一起匹配了其中一个模式。
您可以将.+
改成[^;]+
,将my_num
后面的.?
改成[^;]*;
。[^;]
匹配任何不是;
的字符。另外,如果您这样做,就不需要前瞻断言(?=NAME_|NOT_NAME_|$)
了。新的正则表达式可能如下所示:
(?s)((NAME_ [^;]+ (AGE_|DESC_) (373|416) [^;]*;)|(NAME_ADD_ [^;]+ CHAR_DESC_ADD_ [^;]+ DESC_ (373|416) [^;]*;))
英文:
The main problem I see with the regex is that you only match space and single optional character after the my_num
. In your original text there is no sequence that matches this, so that is why the result is empty. Also the .+
should be changed to exclude the ;
character, otherwise the regex could match the whole file as long as the first and last few of characters together match one of the patterns.
You could change the .+
to [^;]+
and the .?
after my_num
to [^;]*;
. The [^;]
matches any character that is not ;
. Also if you do this the lookahead assertion (?=NAME_|NOT_NAME_|$)
is not needed. The new regex could look like this:
(?s)((NAME_ [^;]+ (AGE_|DESC_) (373|416) [^;]*;)|(NAME_ADD_ [^;]+ CHAR_DESC_ADD_ [^;]+ DESC_ (373|416) [^;]*;))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论