无法使正则表达式在没有空格的文本中工作 (.NET)。

huangapple go评论50阅读模式
英文:

cant get Regex to work (.NET) without a space in text

问题

I'm trying to split a series of lines based on the LAST underscore character.

It's something to do with that one line not having a space in the text, and I'm not familiar with regex concepts like lookahead and lookbehind.

Using:

^(.*?)_(\w+?)*$

Run against:

2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front  Side Building Elevations_F
2022-366_DA07_Drivethru  Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A

Produces:

2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front  Side Building Elevations==F
2022-366_DA07_Drivethru  Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A
英文:

im trying to split a series of lines based on the LAST underscore character.

its something to do with that one line not having a space in the text & im not familiar with regex concepts like lookahead and lookbehind

using

^(.*?)_(\w+?)*$

run against

2022-366_DA00_Cover Sheet_C
2022-366_DA01_Locality Plan_E
2022-366_DA02_Site Plan_H
2022-366_DA03_Delivery Plan_E
2022-366_DA04_Floorplan_D
2022-366_DA05_Roof Plan_D
2022-366_DA06_Front  Side Building Elevations_F
2022-366_DA07_Drivethru  Rear Building Elevations_D
2022-366_DA08_External Finishes Schedule_A

produces

2022-366_DA00_Cover Sheet==C
2022-366_DA01_Locality Plan==E
2022-366_DA02_Site Plan==H
2022-366_DA03_Delivery Plan==E
**2022-366==D**
2022-366_DA05_Roof Plan==D
2022-366_DA06_Front  Side Building Elevations==F
2022-366_DA07_Drivethru  Rear Building Elevations==D
2022-366_DA08_External Finishes Schedule==A

答案1

得分: 1

你不必重复捕获组,因为你想要匹配一个或多个单词字符(如果使用 *,整个组也可以是可选的)

匹配单词字符 \w+?.*? 不必非贪婪。

如果你想匹配单个大写字母 A-Z,你也可以使用 [A-Z] 而不是 \w+

你可以编写排除下划线不匹配的单词字符的模式:

^(.*)_([^\\W_]+)$

该模式匹配:

  • ^ 字符串的开始
  • (.*) 捕获 第一个组,匹配整行
  • _ 匹配 _
  • ([^\\W_]+) 捕获 第二个组,匹配1个或多个非下划线的单词字符
  • $ 字符串的结束

请参见正则表达式演示

英文:

You do not have to repeat the capture group as you want to match 1 or more word characters (also if you use * then the whole group could also be optional)

Matching the word characters \w+? and the .*? do not have to be non greedy.

If you want to match a single uppercase char A-Z you could also use [A-Z] instead of \w+

You might write the pattern excluding matching an underscore from the word characters:

^(.*)_([^\W_]+)$

The pattern matches:

  • ^ Start of string
  • (.*) Capture group 1, match the whole line
  • _ Match _
  • ([^\W_]+) Capture group 2, match 1+ word chars except for _
  • $ End of string

See a regex demo

答案2

得分: 0

我认为你的正则表达式应该是这样的:

^     ... 行的开头
(...) ... 第一组
.*    ... 匹配所有字符 (.) 如果有的话 (*)
_     ... 匹配最后一个(记住贪婪匹配的特性)下划线
(...) ... 第二组
.*    ... 匹配所有字符,如果有的话
$     ... 行的结尾
英文:

that:

^(.?)_(\w+?)$

does not seem right to me, as there is this additional quantifier on the second group? Furthermore I don't think you want the '?' quantifier on either of those groups. Here is what I read from your regex:

^     ... start of the line
(...) ... first group
.?    ... every character matches(.), if there is exactly zero or one of them(?)
_     ... match an underscore(due to default greedy matching, this will match on the last underscore)
(...) ... second group
\w+?  ... any word character will match if there is one or more of them. the question mark is too much, as there can only be one quantifier.
$     ... end of the line

here is what I would have tried using:

^(.*)_(.*)$

which translates to:

^     ... start of the line
(...) ... first group
.*    ... match every character (.) if there is any (*)
_     ... matches the last(remember the greedy thing) underscore
(...) ... second group
.*    ... match every character if there is any
$     ... end of the line

this should bring the needed information stored after the last matching underscore.

huangapple
  • 本文由 发表于 2023年4月13日 17:35:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76003912.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定