2023年1月6日 12:45:14go评论161阅读模式

英文:

Why this regex pattern fails to match a given string and how to correct it?

问题

I want to capture all characters using python regex which satisfy one of the three conditions described below.

(~ means zero or more characters)

[pattern1] NAME_ ”words_or_numbers” AGE_ my_num ~;

[pattern2] NAME_ ”words_or_numbers” DESC_ my_num ~;

[pattern3] NAME_ADD_ ”words_or_numbers” CHAR_DESC_ADD_ word_or_numbers_or_underscore DESC_ my_num ~;

For [pattern1], [pattern2], [pattern3], I’d like to find only the text that matches the given my_num. For example, the example below indicates that I picked 373 and 416 as the my_num values.

(Note that each pattern can contain multiline characters)

Original Text:

NAME_ &quot;Hello&quot; AGE_ 373 0;
NAME_ &quot;Summer&quot; AGE_ 340 0;
NAME_ &quot;Sam&quot; AGE_ 416 14;
NAME_ &quot;Edward&quot; DESC_ 373 ABC_DEF_G &quot;These are users.

age, description

- example(0x15) , Isfalse : 0xF+df

- safe.

- (t) = + 1&quot;;
NAME_ &quot;Alex&quot; DESC_ 373 asdf 65535;
NAME_ADD_ &quot;Crystal&quot; CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ &quot;Ray&quot; DESC_ 111 asdfs 3;
NAME_ &quot;Brown&quot; DESC_ 416 asdfs 3;
NAME_ADD_ &quot;Hailey&quot; CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ &quot;Watson&quot; AGE_ 373 0;
NOT_NAME_ 324 XYZ 22 &quot;A&quot; 1 &quot;B&quot; 2 &quot;C&quot; 3 &quot;R&quot; ;

Desired Output:

NAME_ &quot;Hello&quot; AGE_ 373 0;
NAME_ &quot;Sam&quot; AGE_ 416 14;
NAME_ &quot;Edward&quot; DESC_ 373 ABC_DEF_G &quot;These are users.

age, description

- example(0x15) , Isfalse : 0xF+df

- safe.

- (t) = + 1&quot;;
NAME_ &quot;Alex&quot; DESC_ 373 asdf 65535;
NAME_ADD_ &quot;Crystal&quot; CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ &quot;Brown&quot; DESC_ 416 asdfs 3;
NAME_ADD_ &quot;Hailey&quot; CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ &quot;Watson&quot; AGE_ 373 0;

I’ve tried using regex like (with re.findall method):

(?s)((NAME_ .+ (AGE_|DESC_) (373|416) .?(?=NAME_|NOT_NAME_|$))|(NAME_ADD_ .+ CHAR_DESC_ADD_ .+ DESC_ (373|416) .?(?=NAME_|NOT_NAME_|$)))

but it captured nothing. What's wrong with my attempt, and how can this be done properly?

英文:

I want to capture all characters using python regex which satisfy one of the three conditions described below.

(~ means zero or more characters)

[pattern1] NAME_ ”words_or_numbers” AGE_ my_num ~;

[pattern2] NAME_ ”words_or_numbers” DESC_ my_num ~;

[pattern3] NAME_ADD_ ”words_or_numbers” CHAR_DESC_ADD_ word_or_numbers_or_underscore DESC_ my_num ~;

(Note that each pattern can contain multiline characters)

Original Text:

NAME_ &quot;Hello&quot; AGE_ 373 0;
NAME_ &quot;Summer&quot; AGE_ 340 0;
NAME_ &quot;Sam&quot; AGE_ 416 14;
NAME_ &quot;Edward&quot; DESC_ 373 ABC_DEF_G &quot;These are users.

age, description

- example(0x15) , Isfalse : 0xF+df

- safe.

- (t) = + 1&quot;;
NAME_ &quot;Alex&quot; DESC_ 373 asdf 65535;
NAME_ADD_ &quot;Crystal&quot; CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ &quot;Ray&quot; DESC_ 111 asdfs 3;
NAME_ &quot;Brown&quot; DESC_ 416 asdfs 3;
NAME_ADD_ &quot;Hailey&quot; CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ &quot;Watson&quot; AGE_ 373 0;
NOT_NAME_ 324 XYZ 22 &quot;A&quot; 1 &quot;B&quot; 2 &quot;C&quot; 3 &quot;R&quot; ;

Desired Output:

NAME_ &quot;Hello&quot; AGE_ 373 0;
NAME_ &quot;Sam&quot; AGE_ 416 14;
NAME_ &quot;Edward&quot; DESC_ 373 ABC_DEF_G &quot;These are users.

age, description

- example(0x15) , Isfalse : 0xF+df

- safe.

- (t) = + 1&quot;;
NAME_ &quot;Alex&quot; DESC_ 373 asdf 65535;
NAME_ADD_ &quot;Crystal&quot; CHAR_DESC_ADD_ GGE_R DESC_ 373 ABCD 340;
NAME_ &quot;Brown&quot; DESC_ 416 asdfs 3;
NAME_ADD_ &quot;Hailey&quot; CHAR_DESC_ADD_ GGE3 DESC_ 416 ABCD 120;
NAME_ &quot;Watson&quot; AGE_ 373 0;

I’ve tried using regex like (with re.findall method):

(?s)((NAME_ .+ (AGE_|DESC_) (373|416) .?(?=NAME_|NOT_NAME_|$))|(NAME_ADD_ .+ CHAR_DESC_ADD_ .+ DESC_ (373|416) .?(?=NAME_|NOT_NAME_|$)))

but it captured nothing. What's wrong with my attempt, and how can this be done properly?

答案1

得分: 2

正则表达式的主要问题在于，您只匹配了my_num后面的空格和单个可选字符。在您的原始文本中，没有与此匹配的序列，因此结果为空。另外，.+应该被改成排除;字符，否则正则表达式可以匹配整个文件，只要前后几个字符连在一起匹配了其中一个模式。

您可以将.+改成[^;]+，将my_num后面的.?改成[^;]*;。[^;]匹配任何不是;的字符。另外，如果您这样做，就不需要前瞻断言(?=NAME_|NOT_NAME_|$)了。新的正则表达式可能如下所示：

(?s)((NAME_ [^;]+ (AGE_|DESC_) (373|416) [^;]*;)|(NAME_ADD_ [^;]+ CHAR_DESC_ADD_ [^;]+ DESC_ (373|416) [^;]*;))

英文:

The main problem I see with the regex is that you only match space and single optional character after the my_num. In your original text there is no sequence that matches this, so that is why the result is empty. Also the .+ should be changed to exclude the ; character, otherwise the regex could match the whole file as long as the first and last few of characters together match one of the patterns.

You could change the .+ to [^;]+ and the .? after my_num to [^;]*;. The [^;] matches any character that is not ;. Also if you do this the lookahead assertion (?=NAME_|NOT_NAME_|$) is not needed. The new regex could look like this:

(?s)((NAME_ [^;]+ (AGE_|DESC_) (373|416) [^;]*;)|(NAME_ADD_ [^;]+ CHAR_DESC_ADD_ [^;]+ DESC_ (373|416) [^;]*;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么这个正则表达式模式无法匹配给定的字符串，如何进行修正？

问题

答案1

Streaming high frequency data with Python requests API – latency issues

如何在张量（图像）中进行位移而不必使用循环？

为什么在查询整数列中的浮点数据时会出现慢查询和资源争用？

在字典中获取价格的平均值（Django JSONField）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论