2022年7月28日 14:12:46go评论113阅读模式

英文:

regex repeated capturing group captures the last iteration but I need all

问题

示例代码：

	var reStr = `&quot;(?:\\&quot;|[^&quot;])*&quot;`
	var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)` + `(?:\s*\+\s*(` + reStr + `)){0,}`)
	var str = `
test1(&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot;);
test2(&quot;Second string &quot; + &quot;sum&quot;);
`
	for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
		fmt.Println(match, "found at index", i)
		for i, str := range match {
			fmt.Println(i, str)
		}
	}

输出结果：

[&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot; &quot;This\nis\ta\\string&quot; &quot;Third string summed&quot;] found at index 0
0 &quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot;
1 &quot;This\nis\ta\\string&quot;
2 &quot;Third string summed&quot;
[&quot;Second string &quot; + &quot;sum&quot; &quot;Second string &quot; &quot;sum&quot;] found at index 1
0 &quot;Second string &quot; + &quot;sum&quot;
1 &quot;Second string &quot;
2 &quot;sum&quot;

第一个匹配的第0组包含了所有三个字符串（正则表达式匹配正确），但是表达式中只有两个捕获组，第二个组只包含了重复的最后一次迭代。例如，"Another\"string" 在这个过程中丢失了，无法访问。

是否有可能以某种方式在第2组中获取所有迭代（所有重复项）？

我也可以接受使用嵌套循环的任何解决方法。但请注意，我不能简单地用外部的FindAllStringSubmatch调用替换{0,}重复，因为FindAllStringSubmatch调用已经用于迭代“字符串和的和”。换句话说，我必须找到第一个字符串和以及"Second string sum"。

英文:

Example code:

	var reStr = `&quot;(?:\\&quot;|[^&quot;])*&quot;`
	var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)\s*\+\s*(` + reStr + `)\s*\+\s*(` + reStr + `)`)
	var str = `&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string&quot;
`
	for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
		fmt.Println(match, &quot;found at index&quot;, i)
		for i, str := range match {
			fmt.Println(i, str)
		}
	}

Output:

[&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string&quot; &quot;This\nis\ta\\string&quot; &quot;Another\&quot;string&quot; &quot;Third string&quot;] found at index 0
0 &quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string&quot;
1 &quot;This\nis\ta\\string&quot;
2 &quot;Another\&quot;string&quot;
3 &quot;Third string&quot;

E.g. it matches the "sum of strings" and it captures all three strings correctly.

My problem is that I do not want to match the sum of exactly three strings. I want to match all "sum of strings" where the sum can consist of one or more string literals. I have tried to express this with {0,}

	var reStr = `&quot;(?:\\&quot;|[^&quot;])*&quot;`
	var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)` + `(?:\s*\+\s*(` + reStr + `)){0,}`)
	var str = `
test1(&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot;);
test2(&quot;Second string &quot; + &quot;sum&quot;);
`
	for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
		fmt.Println(match, &quot;found at index&quot;, i)
		for i, str := range match {
			fmt.Println(i, str)
		}
	}
`)){0,}`)

then I get this result:

[&quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot; &quot;This\nis\ta\\string&quot; &quot;Third string summed&quot;] found at index 0
0 &quot;This\nis\ta\\string&quot; + 
	&quot;Another\&quot;string&quot; + 
	&quot;Third string summed&quot;
1 &quot;This\nis\ta\\string&quot;
2 &quot;Third string summed&quot;
[&quot;Second string &quot; + &quot;sum&quot; &quot;Second string &quot; &quot;sum&quot;] found at index 1
0 &quot;Second string &quot; + &quot;sum&quot;
1 &quot;Second string &quot;
2 &quot;sum&quot;

Group 0 of the first match contains all three strings (the regexp matches correctly), but there are only two capturing groups in the expression, and the second group only contains the last iteration of the repetition. E.g. "Another\"string" is lost in the process, it cannot be accessed.

Would it be possible to get all iterations of (all repetitions) inside group 2 somehow?

I would also accept any workaround that uses nested loops. But please be aware that I cannot simply replace the {0,} repetition with an outer FindAllStringSubmatch call, because the FindAllStringSubmatch call is already used for iterating over "sums of strings". In other words, I must find the first string sum and also the "Second string sum".

答案1

得分: 2

我刚刚找到了一个可行的解决方法。我可以进行两次处理。在第一次处理中，我只匹配所有的字符串字面量，并在原始文本中用唯一的占位符替换它们。然后，转换后的文本将不包含任何字符串，这样在第二次处理中对其进行进一步处理就变得更容易了。

大致的代码如下：

type javaString struct {
	value  string
	lineno int
}
// 首先我们找到所有的字符串字面量
var placeholder = "JSTR"
var reJavaStringLiteral = regexp.MustCompile(`(?m)("(?:\\"|[^"])*")`)
javaStringLiterals := make([]javaString, 0)
for pos, strMatch := range reJavaStringLiteral.FindAllStringSubmatch(strContent, -1) {
	pos = strings.Index(strContent, strMatch[0])
	head := strContent[0:pos]
	lineno := strings.Count(head, "\n") + 1
	javaStringLiterals = append(javaStringLiterals, javaString{value: strMatch[1], lineno: lineno})
}
// 接下来，我们用占位符替换所有的字符串字面量。
for i, jstr := range javaStringLiterals {
	strContent = strings.Replace(strContent, jstr.value, fmt.Sprintf("%v(%v)", placeholder, i), 1)
}
// 现在转换后的文本不包含任何字符串字面量。

第一次处理后，原始文本变为：

test1(JSTR(1) +
	JSTR(2) +
	JSTR(3));
test2(JSTR(3) + JSTR(4));

在这一步之后，我可以轻松查找 "JSTR(\d+) + JSTR(\d+) + JSTR(\d+)..." 这样的表达式。现在它们很容易找到，因为文本不包含任何字符串（否则可能包含任何内容并干扰正则表达式）。这些 "字符串之和" 的匹配可以再次使用 FindAllStringSubmatch（在内部循环中）重新匹配，然后我就可以得到所有需要的信息。

这不是一个真正的解决方案，因为它需要编写大量的代码，它只适用于我的具体用例，并且实际上并没有回答原始问题：允许在重复捕获组内访问所有迭代。

但是这个解决方法的一般思路对于面临类似问题的人可能是有益的。

英文:

I just found a workaround that will work. I can do two passes. In the first pass, I just match all string literals, and replace them with unique placeholders in the original text. Then the transformed text won't contain any strings, and it becomes much easier to do further processing on it in a second pass.

Something like this:

type javaString struct {
	value  string
	lineno int
}
	// First we find all string literals
	var placeholder = &quot;JSTR&quot;
	var reJavaStringLiteral = regexp.MustCompile(`(?m)(&quot;(?:\\&quot;|[^&quot;])*&quot;)`)
	javaStringLiterals := make([]javaString, 0)
	for pos, strMatch := range reJavaStringLiteral.FindAllStringSubmatch(strContent, -1) {
		pos = strings.Index(strContent, strMatch[0])
		head := strContent[0:pos]
		lineno := strings.Count(head, &quot;\n&quot;) + 1
		javaStringLiterals = append(javaStringLiterals, javaString{value: strMatch[1], lineno: lineno})
	}
	// Next, we replace all string literals with placeholders.
	for i, jstr := range javaStringLiterals {
		strContent = strings.Replace(strContent, jstr.value, fmt.Sprintf(&quot;%v(%v)&quot;, placeholder, i), 1)
	}
    // Now the transformed text does not contain any string literals.

After the first pass, the original text becomes:

		test1(JSTR(1) +
			JSTR(2) +
			JSTR(3));
		test2(JSTR(3) + JSTR(4));

After this step, I can easily look for "JSTR(\d+) + JSTR(\d+) + JSTR(\d+)..." expressions. Now they are easy to find, because the text does not contain any strings (that could otherwise contain practically anything and interfere with regular expressions). These "sum of string" matches can then be re-matched with another FindAllStringSubmatch (in an inner loop) and then I'll get all information that I needed.

This is not a real solution, because it requires writting a lot of code, it is specific to my concrete use case, and does not really answer the original question: allow access to all iterations inside a repeated capturing group.

But the general idea of the workaround might be benefical for somebody who is facing a similar problem.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式的重复捕获组只捕获最后一次迭代，但我需要全部捕获。

问题

答案1

Convert gif image into base64 in GO

在Golang中输入密码到命令行的方式是：

在Go语言中解析格式错误的XML文件

为什么Go语言有一个 “goto” 语句？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。