使用JavaScript正则表达式组来匹配,如果其他组在它之前不匹配。

huangapple go评论63阅读模式
英文:

Use JavaScript regex groups to match if other groups dont match before it

问题

可以使用 JavaScript 正则表达式来捕获第一个组(或多个组),如果它们没有捕获到内容吗?

在这里,我有两个(或三个)捕获组:

/("[^\n"]*")|("[^\n"]*(?:\n|$))/g

我想要的是将未匹配到的内容放入另一个组中,就像这样:

/("[^\n"]*")|("[^\n"]*(?:\n|$))|(<CATCH ANYTHING ELSE HERE>)/g

这是否可行?

在实际应用中,这意味着始终匹配整个字符串,但能够将其分段。

(我正在构建一个代码编辑器,尝试解析代码中的字符串)。

我尝试添加 (.*),但似乎总是在其他两个之前匹配到 "

编辑:

为了简化问题,假设我想将一个字符串分成两组:

  1. 所有字符都是 a 的字符。
  2. 所有字符都不是 a 的字符。

给定正则表达式 /(a)|(^\1)/g,我会认为整个字符串会匹配,但实际情况并非如此。为什么?在更复杂的情况下,我认为使用反向引用更好?

英文:

Can I use a javascript regex to capture if the first group(s) don't capture?

Here i have two(three) capturing groups:

/("[^\n"]*")|("[^\n"]*(?:\n|$))/g

What i would like is to place anything that did not match in the first two, in another like so

/("[^\n"]*")|("[^\n"]*(?:\n|$))|(<CATCH ANYTHING ELSE HERE>)/g

Is this possible?

In practice, this would mean always matching the entire string, but being able to segment it.

(I am building a code editor, trying to parse strings in the code).

I tried adding (.*) but that seemed to always catch "before" the other two.

EDIT:

To simplify, lets say I want to segment a string into two groups:

  1. All characters that are a
  2. All characters that are not a

Given the regex /(a)|(^\1)/g, I would assume the entire string would match but this is not the case. Why? In more complex cases I assume using backreferences is better?

答案1

得分: 1

这个表达式完成了任务:

(regex1)|(regex2)|((?:(?!regex1|regex2).)*)

其中:

  • regex1 是你的第一个正则表达式。
  • regex2 是你的第二个正则表达式。
  • 最后一组捕获一个字符,重复直到匹配到前两个正则表达式中的一个。特别地,(?!...) 被称为负向前瞻

在你的示例中,你可以使用 /(a)|((?:(?!a).)*)/g

附言:请注意,你的正则表达式也是错误的,因为 ^ 匹配字符串的开头

附言:正如评论中指出的,反向引用 \1 不会起作用,因为它引用了实际匹配的文本,而不是捕获它的表达式。

英文:

This expression does the job:

(regex1)|(regex2)|((?:(?!regex1|regex2).)*)

where:

  • regex1 is your first regex
  • regex2 is your second regex
  • the last group captures a single character repeated until one of the first two regexes is matched. In particular, (?!...) is called negative lookahead.

In your example, you can use /(a)|((?:(?!a).)*)/g.

P.S. Note that your regex is wrong also because ^ matches the beginning of a string!

P.P.S. As pointed out in comments, backreference \1 won't work, because it refers to the actual text matched, not to the expression that catched it.

huangapple
  • 本文由 发表于 2023年5月30日 01:40:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76359337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定