“Regex not in group” 可翻译为 “正则表达式不在分组中”。

huangapple go评论64阅读模式
英文:

Regex not in group

问题

You can modify your regex pattern to capture the FTX text while ignoring the ?' in the text by using a negative lookahead assertion. Here's the modified regex pattern:

FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+(.*?)(?=\?'|$)

This pattern uses (?=\?'|$) as a lookahead assertion, which ensures that the regex will match until it encounters either ?'' or the end of the string ($). This way, it will ignore the ?'' and capture the desired text.

In your provided EDI text, this regex should correctly capture the FTX text even when it contains ?''.

英文:

Iam trying to get some text from a EDI file, with Regex. I got this string/text:

UNA:+.? '
UNB+UNOC:3+5790000120420:14+5790000181872:14+991111:1850+KuvertNr1234'
UNH+BrevNr5678+CONTRL:D:93A:ZZ:C0230Q+CTL02'
UCI+MEDREF01095+5790000181872:14+5790000120420:14+4'
UCM+1111MAN01095+MEDREF:D:93A:UN:H0130R+4'
FTX+VER+P00++EDI-brev med nummeret 1111MAN01095, afsendt 11/11 1999 kl 18.46 har \:ikke kunnet modtages. Horsens Sygehus, laboratoriet kan ikke modtage \:sygehushenvisninger. :Med venlig hilsen: IT-Hotline. Horsens Sygehus.Telefon 12345678.'UNT+5+BrevNr5678'UNZ+1+KuvertNr1234'

And i need the FTX text. And i got this regex for it: FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+(.*?)'

But in the Edifact, ? escapes ', so if i add ?' to the text

UNA:+.? '
UNB+UNOC:3+5790000120420:14+5790000181872:14+991111:1850+KuvertNr1234'
UNH+BrevNr5678+CONTRL:D:93A:ZZ:C0230Q+CTL02'
UCI+MEDREF01095+5790000181872:14+5790000120420:14+4'
UCM+1111MAN01095+MEDREF:D:93A:UN:H0130R+4'
FTX+VER+P00++EDI-brev med nummeret 1111MAN01095, afsendt 11/11 1999 kl 18.46 har \:ikke kunnet modtages. Horsens Sygehus, laboratoriet kan ikke modtage \:sygehushenvisninger. :Med venlig hilsen: IT-Hotline.?' Horsens Sygehus.Telefon 12345678.'UNT+5+BrevNr5678'UNZ+1+KuvertNr1234'

My regex stops at the ' char right after ?. How can i use the .*? but ignore the "?'" in the text?
The Edifact can either be with \n or as a long string without \n

Tried with: FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+(.*?)'

答案1

得分: 1

你可以使用以下正则表达式:

FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+(.*?)'(?<!\?')

查看正则表达式演示

详细信息

  • FTX\+ - FTX+ 字符串
  • ([a-zA-Z]{2,4}) - 第一组:两到四个ASCII字母数字字符
  • \+ - 一个 + 字符
  • ([a-zA-Z0-9]{3}) - 第二组:三个ASCII字母数字字符
  • \+\+ - 一个 ++ 字符串
  • (.*?) - 第三组:除换行符之外的任意零个或多个字符,尽可能少
  • '(?<!\?') - 一个 ' 字符,不在 ? 字符之前。
英文:

You can use

FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+(.*?)'(?<!\?')

See the regex demo.

Details:

  • FTX\+ - FTX+ string
  • ([a-zA-Z]{2,4}) - Group 1: two to four ASCII alphanumeric chars
  • \+ - a + char
  • ([a-zA-Z0-9]{3}) - Group 2: three ASCII alphanumeric chars
  • \+\+ - a ++ string
  • (.*?) - Group 3: any zero or more chars other than a newline char, as few as possible
  • '(?<!\?') - a ' char that is not preceded with a ? char.

答案2

得分: 1

以下是您提供的代码部分的中文翻译:

另一种选项可能是使用否定字符类来排除匹配的 ',并且只在它直接前面有问号的情况下匹配它:

FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+([^']*(?:'(?<=\?.)[^']*)*)'

最后一部分 ([^']*(?:'(?<=\?.)[^']*)*) 匹配:

  • ( 捕获组
    • [^']* 匹配除了 ' 之外的可选字符
    • (?: 非捕获组,作为整体重复
      • '(?<=\?.) 匹配 ' 并使用正向后瞻来断言它前面有一个 ?
      • [^']* 匹配除了 ' 之外的可选字符
    • )* 关闭非捕获组并可选重复
  • ) 关闭捕获组
  • ' 字面匹配

正则表达式演示

英文:

Another option could be to exclude matching ' using a negated character class, and only match it when it is directly preceded by a question mark:

FTX\+([a-zA-Z]{2,4})\+([a-zA-Z0-9]{3})\+\+([^']*(?:'(?<=\?.)[^']*)*)'

The last part ([^']*(?:'(?<=\?.)[^']*)*) matches:

  • ( Capture group
    • [^']* Match optional chars other than '
    • (?: Non capture group to repeat as a whole part
      • '(?<=\?.) Match ' and assert ? before it using a positive lookbehind
      • [^']* Match optional chars other than '
    • )* Close the non capture group and optionally repeat
  • ) Close the capture group
  • ' Match literally

Regex demo

huangapple
  • 本文由 发表于 2023年5月17日 17:21:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76270492.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定