正则表达式可以识别与限定字符交错的字符吗?

huangapple go评论109阅读模式
英文:

Can regex identify characters interspersed with a limit?

问题

我是新手使用正则表达式,但我觉得我的模式可能太复杂了。

我正在寻找一个最小括号数和最大点数交替的模式。我看不到正则表达式如何计算整个模式中点的数量,而不是按顺序计算。

例如:

...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............

如果我想要识别至少有25个(的连续运行,并且从第一个(到最后一个(的点的数量最多为15个:

...<b>((((((((.(((..((..((((.(((((((.(..(((((.(((.(((</b>...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............

我的正则表达式目前正在搜索最多连续15个.的序列。

这是否可行?如果不行,我应该使用其他方法(例如pyparsing)。

到目前为止,这是我拥有的:

(\.{0,15}\(){25,}
英文:

I am new to using regex but I feel my pattern may be too complex.

I am looking for a pattern of a minimum number of brackets with a maximum number of dots interspersed. I can't see a way for regex to count the numbers of dots in the overall pattern instead of sequentially.

For example:

...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............

If I want to identify a run of at least 25 (s with a maximum of 15 .s interspersed from the first ( to the last:

<pre>
...<b>((((((((.(((..((..((((.(((((((.(..(((((.(((.(((</b>...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............
</pre>

My regex is currently searching for a a sequence with a maximum of 15 consecutive .s instead.

Is this possible? If not should I be using an alternative (i.e. pyparsing)

This is what I have so far:

(\.{0,15}\(){25,}

答案1

得分: 2

根据 @Freeman 的想法,结合正则表达式和字符串操作:

import re

s = "...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............"

pattern = r"\([\(.]+\"" # 这个模式以'('开头,以')'结束,只能包含'('和'.'。

matches = re.findall(pattern, s) # 找到所有符合模式的部分

for match in matches:
    print(match)
    print(f"'(' 的数量:{match.count('(')}")  # 计算模式中每种字符的数量
    print(f"'.' 的数量:{match.count('.')}")

输出:

((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))
('(' 的数量:36
'.' 的数量:11
(((.((.(((((...((........))))))))))))
('(' 的数量:12
'.' 的数量:5

通过这些计数,你可以根据特定的要求轻松筛选匹配项。

英文:

Based on @Freeman's idea combining regex and string operations:

import re

s = &quot;...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............&quot;

pattern = r&quot;\([\(.]+\(&quot; # this pattern starts and ends with &#39;(&#39; and may only contain &#39;(&#39; and &#39;.&#39;

matches = re.findall(pattern, s) # find all such patterns

for match in matches:
    print(match)
    print(f&quot;(&#39;s in match: {match.count(&#39;(&#39;)}&quot;)  # count characters of each type in pattern
    print(f&quot;.&#39;s in match: {match.count(&#39;.&#39;)}&quot;)  #

Output:

((((((((.(((..((..((((.(((((((.(..(((((.(((.(((
(&#39;s in match: 36
.&#39;s in match: 11
(((.((.(((((...((
(&#39;s in match: 12
.&#39;s in match: 5

With the counts you can easily filter out matches according to your specific requirements.

答案2

得分: 2

以下是翻译好的部分:

  • \( 匹配 (
  • (?!(?:\(*\.){16}) 负向前瞻,断言当前位置的右侧不会有16个连续的点号,只允许在它们之间有可选的 ( 字符。
  • (?:\.*\(){24,} 重复匹配24次或更多次,匹配可选的点号后跟匹配一个单独的 (

如果要允许尾随的点号:

  • \((?!(?:\(*\.){16}\.*\()(?:\.*\(){24,}

正则表达式演示

英文:

You could use:

\((?!(?:\(*\.){16})(?:\.*\(){24,}

The pattern matches:

  • \( Match (
  • (?!(?:\(*\.){16}) Negative lookahead, assert not 16 dots directly to the right of the current position, allowing only optional ( chars in between
  • (?:\.*\(){24,} Repeat 24 or more times matching optional dots followed by matching a single (

Regex demo

If you want to allow trailing dots:

\((?!(?:\(*\.){16}\.*\()(?:\.*\(){24,}

Regex demo

答案3

得分: 1

我认为你可以在Python中使用正则表达式和字符串操作的组合,例如这样:

import re

#示例数据
text = "...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............"

#匹配外层括号之间的所有内容
pattern = r"\([^()]*\)"

#找到所有匹配的内容
matches = re.findall(pattern, text)

#遍历匹配项
for match in matches:
    dot_count = match.count(".")
    if dot_count <= 15:
        print("Pattern matched!")
        break
else:
    print("Pattern not matched.")

输出:

Pattern matched!
英文:

I think you can use a combination of regex and string manipulation in python like this for example :

import re

#sample data
text = &quot;...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............&quot;

#to match everything between the outer brackets
pattern = r&quot;\([^()]*\)&quot;

#find all matches of the pattern
matches = re.findall(pattern, text)

#iterate through the matches
for match in matches:
    dot_count = match.count(&quot;.&quot;)
    if dot_count &lt;= 15:
        print(&quot;Pattern matched!&quot;)
        break
else:
    print(&quot;Pattern not matched.&quot;)

Output:

Pattern matched!

答案4

得分: 1

以下是一个基于纯正则表达式的愚蠢方法,可以匹配所有12个时间,将“整个匹配”存储在第2组中:

\(                       # 匹配一个'('
(?=                      # 然后向前查看
  (?:\.*\(){24}          # 第25个括号
  (.*)                   # 并捕获直到结尾的任何内容。
)                        # 
(?<=                     # 回溯到第一个'('之前
  (?=                    # 确保那里有
    (                    # 
      \(*                # 
      (?:\.\(*){0,15}    # 从那里开始不超过15个点
    )                    # 直到
    $                  # 我们捕获的第1组。
  )                      # 
  \(                     # 
)                        # 

regex101.com上尝试它

主要思想是使用系列中的最后一个括号作为限制,然后检查这两个括号之间的点数。实际上,回溯不是为了验证,而是为了捕获。

剩下的工作由.finditer()来完成:

for match_no, match in enumerate(regex.finditer(text), 1):
  print(f'{match_no = }')
  print(f'index = {match.start(0)}')
  print(f'{match[2] = !r}\n')

试试看:

match_no = 1
index = 3
match[2] = '((((((((.(((..((..((((.(((((((.(&';

...

match_no = 12
index = 17
match[2] = '((..((((.(((((((.(..(((((.(((.(((&';
英文:

Here's a stupid pure regex approach that matches all 12 times, with "the whole match" stored in group 2:

\(                       # Match a &#39;(&#39;
(?=                      # then lookahead to
  (?:\.*\(){24}          # the 25th bracket
  (.*)                   # and capture anything after that until the end.
)                        # 
(?&lt;=                     # Take a step back behind the first &#39;(&#39;
  (?=                    # then assure that there are
    (                    # 
      \(*                # 
      (?:\.\(*){0,15}    # no more than 15 dots from there
    )                    # until
    $                  # the group 1 we captured.
  )                      # 
  \(                     # 
)                        # 

Try it on regex101.com.

The main idea is to use the last bracket in the series as the limit, then check the number of dots between those two. The lookbehind is actually not needed for verifying, only for capturing.

The rest is the job of .finditer():

for match_no, match in enumerate(regex.finditer(text), 1):
  print(f&#39;{match_no = }&#39;)
  print(f&#39;index = {match.start(0)}&#39;)
  print(f&#39;{match[2] = !r}\n&#39;)

Try it:

match_no = 1
index = 3
match[2] = &#39;((((((((.(((..((..((((.(((((((.(&#39;

...

match_no = 12
index = 17
match[2] = &#39;((..((((.(((((((.(..(((((.(((.(((&#39;

huangapple
  • 本文由 发表于 2023年8月10日 21:44:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876323.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定