英文:
cant get Regex to work (.NET) without a space in text
问题
I'm trying to split a series of lines based on the LAST underscore character.
It's something to do with that one line not having a space in the text, and I'm not familiar with regex concepts like lookahead and lookbehind.
Using:
^(.*?)_(\w+?)*$
Run against:
2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front Side Building Elevations_F
2022-366_DA07_Drivethru Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A
Produces:
2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front Side Building Elevations==F
2022-366_DA07_Drivethru Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A
英文:
im trying to split a series of lines based on the LAST underscore character.
its something to do with that one line not having a space in the text & im not familiar with regex concepts like lookahead and lookbehind
using
^(.*?)_(\w+?)*$
run against
2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front Side Building Elevations_F
2022-366_DA07_Drivethru Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A
produces
2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front Side Building Elevations==F
2022-366_DA07_Drivethru Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A
答案1
得分: 1
你不必重复捕获组,因为你想要匹配一个或多个单词字符(如果使用 *
,整个组也可以是可选的)
匹配单词字符 \w+?
和 .*?
不必非贪婪。
如果你想匹配单个大写字母 A-Z,你也可以使用 [A-Z]
而不是 \w+
。
你可以编写排除下划线不匹配的单词字符的模式:
^(.*)_([^\\W_]+)$
该模式匹配:
^
字符串的开始(.*)
捕获 第一个组,匹配整行_
匹配_
([^\\W_]+)
捕获 第二个组,匹配1个或多个非下划线的单词字符$
字符串的结束
请参见正则表达式演示。
英文:
You do not have to repeat the capture group as you want to match 1 or more word characters (also if you use *
then the whole group could also be optional)
Matching the word characters \w+?
and the .*?
do not have to be non greedy.
If you want to match a single uppercase char A-Z you could also use [A-Z]
instead of \w+
You might write the pattern excluding matching an underscore from the word characters:
^(.*)_([^\W_]+)$
The pattern matches:
^
Start of string(.*)
Capture group 1, match the whole line_
Match_
([^\W_]+)
Capture group 2, match 1+ word chars except for_
$
End of string
See a regex demo
答案2
得分: 0
我认为你的正则表达式应该是这样的:
^ ... 行的开头
(...) ... 第一组
.* ... 匹配所有字符 (.) 如果有的话 (*)
_ ... 匹配最后一个(记住贪婪匹配的特性)下划线
(...) ... 第二组
.* ... 匹配所有字符,如果有的话
$ ... 行的结尾
英文:
that:
^(.?)_(\w+?)$
does not seem right to me, as there is this additional quantifier on the second group? Furthermore I don't think you want the '?' quantifier on either of those groups. Here is what I read from your regex:
^ ... start of the line
(...) ... first group
.? ... every character matches(.), if there is exactly zero or one of them(?)
_ ... match an underscore(due to default greedy matching, this will match on the last underscore)
(...) ... second group
\w+? ... any word character will match if there is one or more of them. the question mark is too much, as there can only be one quantifier.
$ ... end of the line
here is what I would have tried using:
^(.*)_(.*)$
which translates to:
^ ... start of the line
(...) ... first group
.* ... match every character (.) if there is any (*)
_ ... matches the last(remember the greedy thing) underscore
(...) ... second group
.* ... match every character if there is any
$ ... end of the line
this should bring the needed information stored after the last matching underscore.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论