2023年4月13日 17:35:00go评论50阅读模式

英文:

cant get Regex to work (.NET) without a space in text

问题

I'm trying to split a series of lines based on the LAST underscore character.

It's something to do with that one line not having a space in the text, and I'm not familiar with regex concepts like lookahead and lookbehind.

Using:

^(.*?)_(\w+?)*$

Run against:

2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front  Side Building Elevations_F
2022-366_DA07_Drivethru  Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A

Produces:

2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front  Side Building Elevations==F
2022-366_DA07_Drivethru  Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A

英文:

im trying to split a series of lines based on the LAST underscore character.

its something to do with that one line not having a space in the text & im not familiar with regex concepts like lookahead and lookbehind

using

^(.*?)_(\w+?)*$

run against

2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front  Side Building Elevations_F
2022-366_DA07_Drivethru  Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A

produces

2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front  Side Building Elevations==F
2022-366_DA07_Drivethru  Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A

答案1

得分: 1

你不必重复捕获组，因为你想要匹配一个或多个单词字符（如果使用 *，整个组也可以是可选的）

匹配单词字符 \w+? 和 .*? 不必非贪婪。

如果你想匹配单个大写字母 A-Z，你也可以使用 [A-Z] 而不是 \w+。

你可以编写排除下划线不匹配的单词字符的模式：

^(.*)_([^\\W_]+)$

该模式匹配：

^ 字符串的开始
(.*) 捕获 第一个组，匹配整行
_ 匹配 _
([^\\W_]+) 捕获 第二个组，匹配1个或多个非下划线的单词字符
$ 字符串的结束

请参见正则表达式演示。

英文:

You do not have to repeat the capture group as you want to match 1 or more word characters (also if you use * then the whole group could also be optional)

Matching the word characters \w+? and the .*? do not have to be non greedy.

If you want to match a single uppercase char A-Z you could also use [A-Z] instead of \w+

You might write the pattern excluding matching an underscore from the word characters:

^(.*)_([^\W_]+)$

The pattern matches:

^ Start of string
(.*) Capture group 1, match the whole line
_ Match _
([^\W_]+) Capture group 2, match 1+ word chars except for _
$ End of string

See a regex demo

答案2

得分: 0

我认为你的正则表达式应该是这样的：

^     ... 行的开头
(...) ... 第一组
.*    ... 匹配所有字符 (.) 如果有的话 (*)
_     ... 匹配最后一个（记住贪婪匹配的特性）下划线
(...) ... 第二组
.*    ... 匹配所有字符，如果有的话
$     ... 行的结尾

英文:

that:

^(.?)_(\w+?)$

does not seem right to me, as there is this additional quantifier on the second group? Furthermore I don't think you want the '?' quantifier on either of those groups. Here is what I read from your regex:

^     ... start of the line
(...) ... first group
.?    ... every character matches(.), if there is exactly zero or one of them(?)
_     ... match an underscore(due to default greedy matching, this will match on the last underscore)
(...) ... second group
\w+?  ... any word character will match if there is one or more of them. the question mark is too much, as there can only be one quantifier.
$     ... end of the line

here is what I would have tried using:

^(.*)_(.*)$

which translates to:

^     ... start of the line
(...) ... first group
.*    ... match every character (.) if there is any (*)
_     ... matches the last(remember the greedy thing) underscore
(...) ... second group
.*    ... match every character if there is any
$     ... end of the line

this should bring the needed information stored after the last matching underscore.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法使正则表达式在没有空格的文本中工作 (.NET)。

问题

答案1

答案2

如何在单个/公共的正则表达式组中进行匹配或基于条件进行匹配？

How to perform string separations using regex as a reference and that a part of the used separator pattern is not removed from the following string?

Java正则表达式与花括号

JavaScript正则表达式的前瞻（Lookahead）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论