正则表达式字母数字动态块分组

huangapple go评论50阅读模式
英文:

Regex alphanumeric dynamic block grouping

问题

Group 1:

988320
903040

Group 2:

TRANS DECENAL
ACETIC ACID
英文:

I need your help because I have tried in several ways to create a regular expression which I can get by groups but I have been having trouble with a variation which I don't know how to take into account:

Words:

589011 T CANNELLE FLAVOR 8
538160 TP0597C ORANGE FLAVOR
557137 APL0397C STRAWBERRY FLAVOR
556137 APL0397C STRAWBERRY FLAVOR
545129 1APL0397C MANGO FLAVOR
984320 TRANS DECENAL
533160 TP0597C ORANGE FLAVOR
530373 A PINEAPPLE FLAVOR
059311 A STRAWBERRY FLAVOR
508142 T LEMON FLAVOR
547261 A BLACKBERRY FLAVOR
536564 2T BLUEBERRY STRAWBERRY
055406 A MILK FLAVOR
054945 A BLACKBERRY FLAVOR
983347 SUSTANE
902040 ACETIC ACID

Groups 1:

589011 T
538160 TP0597C
557137 APL0397C
556137 APL0397C
545129 1APL0397C
984320
533160 TP0597C
530373 A
059311 A
508142 T
547261 A
536564 2T
055406 A
054945 A
983347
902040

Groups 2:

CANNELLE FLAVOR
ORANGE FLAVOR
STRAWBERRY FLAVOR
STRAWBERRY FLAVOR
MANGO FLAVOR
TRANS DECENAL
ORANGE FLAVOR
PINEAPPLE FLAVOR
STRAWBERRY FLAVOR
LEMON FLAVOR
BLACKBERRY FLAVOR
BLUEBERRY STRAWBERRY
MILK FLAVOR
BLACKBERRY FLAVOR
SUSTANE
ACETIC ACID

Regex Test:

  1. (\d+\s?[A-Z0-9]*)\s+([A-Z\s]+)\s
  2. ^([0-9\s]+(?:[A-Z0-9]+)?)\s(?:[A-Z\s]+)

The difficulty I have with these two words is that the groups should be like this:

988320 TRANS DECENAL 1 KGM 12164500 JC4 20.0000 KILO 263.98/KG 5,279.60
903040 ACETIC ACID 1 KGM 12164500 DJ4 100.0000 KILO 3.17/KG 317.00

Group 1:

988320
903040

Group 2:

TRANS DECENAL
ACETIC ACID

and not like that:

Group 1:

988320 TRANS
903040 ACETIC

Group 2:

DECENAL
ACID

答案1

得分: 1

^(\d+\s?\b(?:[A-Z]|[A-Z]*\d[A-Z\d]*)?\b)\s*([A-Z ]*\b)

第1组匹配数字,后面可以选择匹配一个字母的单词,或者一个包含字母和至少一个数字的混合词。第2组获取之后所有只包含字母的单词,除了它们之间的空格。

英文:
^(\d+\s?\b(?:[A-Z]|[A-Z]*\d[A-Z\d]*)?\b)\s*([A-Z ]*\b)

Group 1 matches digits followed optionally by either a word with a single letter, or a word that contains a mixture of letters and digits with at least one digit. Group 2 gets all the words with just letters after that, except for the whitespace between them.

DEMO

答案2

得分: 0

你可以尝试使用以下正则表达式(regex101链接):

(\d{6}\s(?:[A-Z]\b|(?=\S*\d)[A-Z\d]+)?)\s*(.*)

这将匹配:

  • 第一组: 6位数字 + (可选)一个字符或多个字符与数字的组合
  • 第二组: 行的其余部分
英文:

You can try (regex101):

(\d{6}\s(?:[A-Z]\b|(?=\S*\d)[A-Z\d]+)?)\s*(.*)

This will match

  • GROUP 1: 6 digits + (optionally)one character OR multiple character with digit
  • GROUP 2: the rest of the line

答案3

得分: 0

我假设Group 1和Group 2是根据以下决策规则形成的。

  • 字符串必须以字符串"1"开头的六个数字(字符串"1")后跟一个空格,然后是由一个或多个字母和数字组成的字符串(字符串"2");
  • 如果字符串2后跟着两个空格,后跟由大写字母组成的单词,这两个实例后跟一个空格或位于字符串的末尾,则Group 1由字符串1后跟一个空格,后跟字符串2组成,否则Group 1仅由字符串1组成; 以及
  • Group 2由跟随Group 1后面的大写字母或数字字符串组成,加上零个或多个实例的空格和由大写字母组成的单词,尽可能多的实例。

可以使用以下正则表达式来执行此操作,其中捕获组1和2将分别包含Group 1和Group 2(当然,前提是有匹配的情况)。

^(\d{6}(?: [A-Z\d]+(?=(?: [A-Z][A-Z\d]){2}))?) ([A-Z]+(?: [A-Z]+))

演示


此正则表达式可以分解如下。

^
(# 开始捕获组1
\d{6} # 匹配6位数字
(?: # 开始非捕获组
[ ] # 匹配一个空格
[A-Z\d]+ # 匹配一个或多个(+)大写字母或数字
(?= # 开始正向先行断言
(?: # 开始非捕获组
[ ][A-Z] # 匹配一个空格后跟一个大写字母
[A-Z\d]* # 匹配零个或多个()大写字母或数字
){2} # 结束非捕获组并执行两次
) # 结束正向先行断言
)? # 结束非捕获组并将其设为可选的
) # 结束捕获组1
[ ] # 匹配一个空格
( # 开始捕获组2
[A-Z]+ # 匹配一个或多个(+)大写字母
(?: # 开始非捕获组
[ ] # 匹配一个空格
[A-Z]+ # 匹配一个或多个(+)大写字母
# 结束非捕获组并将其设为可选的
) # 结束捕获组2

在上述内容中,我将每个空格字符表示为一个包含空格的字符类,只是为了使空格可见。

英文:

I have assumed Group 1 and Group 2 are formed using the following decision rule.

  • The string must contain six digits at the beginning of the string (string "1") followed by a space followed by a string comprised of one or more letters and digits (string "2");
  • Group 1 is comprised of string 1 followed by a space, followed by string 2 if string 2 is followed by two instances of a space followed by a word comprised of capital letters, those instances being followed by a space or being at the end of the string, otherwise Group 1 is comprised of string 1 only; and
  • Group 2 is comprised of the string of capital letters or digits that follow the space that follows Group 1, plus zero of more instances of a space and a word comprised of capital letters, as many such instances as possible.

One can do that using the following regular expression, where capture groups 1 and 2 will contain Groups 1 and 2, respectively (provided there is a match, of course).

^(\d{6}(?: [A-Z\d]+(?=(?: [A-Z][A-Z\d]*){2}))?) ([A-Z]+(?: [A-Z]+)*)

Demo


This regular expression can be broken down as follows.

^
(                 # begin capture group 1
  \d{6}           # match 6 digits
  (?:             # begin a non-capture group
    [ ]           # match a space
    [A-Z\d]+      # match one or more (+) upcase ltrs or digits
    (?=           # begin a positive lookahead
      (?:         # begin a non-capture group
        [ ][A-Z]  # match a space followed by a upcase ltr
        [A-Z\d]*  # match zero or more (*) upcase ltrs or digits
      ){2}        # end non-capture group and execute twice 
    )             # end positive lookahead
  )?              # end non-capture group and make it optional
)                 # end capture group 1
[ ]               # match a space
(                 # begin capture group 2
  [A-Z]+          # match one or more (+) upcase ltrs
  (?:             # begin a non-capture group
    [ ]           # match a space
    [A-Z]+        # match one or more (+) upcase ltrs
  )*              # end non-capture group and make it optional
)                 # end capture group 2

In the above I have represented each space character as a character class containing a space merely to make the space visible.

huangapple
  • 本文由 发表于 2023年7月28日 01:03:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781996.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定