Golang正则表达式:忽略多个出现次数

huangapple go评论83阅读模式
英文:

Golang regex : Ignore multiple occurrences

问题

我有一个简单的需求。
给定这个输入(字符串):10 20 30 40 65 45 44 67 100 200 65 40 66 88 65

我需要获取在65和66之间的所有数字。
问题是当每个限制有多个出现时。
使用像这样的正则表达式:(65).+(66),我捕获了65 45 44 67 100 200 65 40 66。但我只想得到40。

我该如何实现这个?

英文:

I've got a simple need.
Giving this input (string) : 10 20 30 40 65 45 44 67 100 200 65 40 66 88 65

I need to get all numbers between 65 and 66.
Problem is when we have multiple occurrence of each limit.
With a regex like : (65).+(66), I captured 65 45 44 67 100 200 65 40 66. But I would like to get only 40.

How could I achieve this ?

https://regex101.com/r/9HoKxr/1

答案1

得分: 5

看起来你想要排除在模式中第一个出现的'66'之前的数字中匹配的'65'?这个正则表达式有点冗长,但是可以这样写:

\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b

在线演示请参见这里

  • \b65\s - 以'65'开头,介于单词边界和空白字符之间;
  • ( - 开启捕获组;
    • (?:\s - 非捕获组,表示一个空白字符;
    • (?:\d|[1-57-9]\d|6[0-46-9]|\d{3,}) - 嵌套的非捕获组,用于匹配除了'65'和'66'之外的任何整数;
    • )+?) - 关闭非捕获组,并至少匹配一次,但尽可能少次数。然后关闭捕获组;
  • \s66\b - 匹配另一个空格,后跟'66'和单词边界。

注意:

  • 我们将使用strings包中的Trim()函数处理前导空格;
  • 在我的示例中,我使用了'10 20 30 40 65 45 44 40 66 200 65 40 66 88 65',它应该返回多个匹配项。在这种情况下,我们要找的是“最短”的匹配子字符串;
  • “最短”意味着当子字符串使用空格拆分时(使用上面提到的Fields函数),我们要找的是元素最少的子字符串。因此,'123456'比'1 2 3'更优先,尽管在字符上它更“长”。

请尝试以下代码:

package main

import (
	"fmt"
	"regexp"
	"strings"
)

func main() {
	s := `10 20 30 40 65 45 44 40 66 200 65 40 66 88 65`
	re := regexp.MustCompile(`\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b`)
	matches := re.FindAllStringSubmatch(s, -1) // 检索所有匹配项

	shortest := ``
	for i, _ := range matches { // 循环遍历数组
		if shortest == `` || len(strings.Fields(matches[i][1])) < len(strings.Fields(shortest)) {
			shortest = strings.Trim(matches[i][1], ` `)
		}
	}
	fmt.Println(shortest)
}

你可以在这里自己尝试运行。

英文:

Sounds like you want to exclude matching '65' inside the number of your pattern upto the 1st occurence of '66'? It's a bit verbose but what about:

\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b

See an online demo


  • \b65\s - Start with '65' between a word-boundary and a whitespace char;
  • ( - Open capture group;
    • (?:\s - Non-capture group with the constant of a whitespace char;
    • (?:\d|[1-57-9]\d|6[0-46-9]|\d{3,}) - Nested non-capture group to match any integer but '65' or '66';
    • )+?) - Close non-capture group and match it at least once but as few times as possible. Then close the capture group;
  • \s66\b - Match another space followed by '66' and word-boundary.

Note:

  • We will handle leading spaces with the Trim() function through the strings package;
  • That in my examples I have used '10 20 30 40 65 45 44 40 66 200 65 40 66 88 65' which should return multiple matches. In such case it's established OP is looking for the 'shortest' matching substring;
  • By 'shortest' it's meant that we are looking for the least amount of elements when the substring is split with spaces (using 'Fields' function from above mentione strings package). Therefor '123456' is prefered above '1 2 3' despite being the 'longer' substring in terms of characters;

Try:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
	&quot;strings&quot;
)

func main() {
	s := `10 20 30 40 65 45 44 40 66 200 65 40 66 88 65`
	re := regexp.MustCompile(`\b65((?:\s(?:\d|[1-57-9]\d|6[0-47-9]|\d{3,}))+?)\s66\b`)
	matches := re.FindAllStringSubmatch(s, -1) // Retrieve all matches

	shortest := ``
	for i, _ := range matches { // Loop over array
		if shortest == `` || len(strings.Fields(matches[i][1])) &lt; len(strings.Fields(shortest)) {
			shortest = strings.Trim(matches[i][1], ` `)
		}
	}
	fmt.Println(shortest)
}

Try it for yourself here.

huangapple
  • 本文由 发表于 2022年10月13日 17:39:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/74053604.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定