2021年12月6日 21:32:35go评论88阅读模式

英文:

how to realize mismatch of regexp in golang?

问题

这是一个多项选择题的示例。我想要在以下的 Golang 代码中获取中文文本，例如“英国、法国”、“加拿大、墨西哥”、“葡萄牙、加拿大”、“墨西哥、德国”，但是它没有起作用。

package main

import (
	"fmt"
	"regexp"
	"testing"
)

func TestRegex(t *testing.T) {
	text := `（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
`

	fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.(\S+)?`).FindAllStringSubmatch(text, -1))
	fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.`).Split(text, -1))
}

文本：

（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国

模式：[A-E]\.(\S+)?

实际结果：[[“A.英国、法国B.加拿大、墨西哥” “英国、法国B.加拿大、墨西哥”] [“C.葡萄牙、加拿大D.墨西哥、德国” “葡萄牙、加拿大D.墨西哥、德国”]]。

期望结果：[[“A.英国、法国” “英国、法国”] [“B.加拿大、墨西哥” “加拿大、墨西哥”] [“C.葡萄牙、加拿大” “葡萄牙、加拿大”] [“D.墨西哥、德国” “墨西哥、德国”]]

我认为这可能是一个贪婪模式的问题。因为在我的代码中，它将选项 A 和选项 B 直接读取为一个选项。

英文:

This is a multiple choice question example. I want to get the chinese text like "英国、法国", "加拿大、墨西哥", "葡萄牙、加拿大", "墨西哥、德国" in the content of following code in golang, but it does not work.

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
	&quot;testing&quot;
)

func TestRegex(t *testing.T) {
	text := `（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
`

	fmt.Printf(&quot;%q\n&quot;, regexp.MustCompile(`[A-E]\.(\S+)?`).FindAllStringSubmatch(text, -1))
	fmt.Printf(&quot;%q\n&quot;, regexp.MustCompile(`[A-E]\.`).Split(text, -1))
}

text:

（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国

pattern: [A-E]\.(\S+)?

Actual result: [["A.英国、法国B.加拿大、墨西哥" "英国、法国B.加拿大、墨西哥"] ["C.葡萄牙、加拿大D.墨西哥、德国" "葡萄牙、加拿大D.墨西哥、德国"]].

Expect result: [["A.英国、法国" "英国、法国"] ["B.加拿大、墨西哥" "加拿大、墨西哥"] ["C.葡萄牙、加拿大" "葡萄牙、加拿大"] ["D.墨西哥、德国" "墨西哥、德国"]]

I think it might be a greedy mode problem. Because in my code, it reads option A and option B as one option directly.

答案1

得分: 1

非贪婪匹配无法解决这个问题，你需要使用正向先行断言，而re2不支持这个功能。

作为一种解决方法，可以通过在标签上进行搜索并手动提取之间的文本。

re := regexp.MustCompile(`[A-E]\.`)
res := re.FindAllStringIndex(text, -1)
results := make([][]string, len(res))
for i, m := range res {
    if i < len(res)-1 {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:res[i+1][0]]}
    } else {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:]}
    }
}

fmt.Printf("%q\n", results)

应该打印出：

[["A." "英国、法国"] ["B." "加拿大、墨西哥\n"] ["C." "葡萄牙、加拿大"] ["D." "墨西哥、德国\n"]]

英文:

Non-greedy matching won't solve this, you need positive lookahead, which re2 doesn't support.

As a workaround can just search on the labels and extract the text in between manually.

re := regexp.MustCompile(`[A-E]\.`)
res := re.FindAllStringIndex(text, -1)
results := make([][]string, len(res))
for i, m := range res {
	if i &lt; len(res)-1 {
		results[i] = []string{text[m[0]:m[1]], text[m[1]:res[i+1][0]]}
	} else {
		results[i] = []string{text[m[0]:m[1]], text[m[1]:]}
	}
}

fmt.Printf(&quot;%q\n&quot;, results)

Should print

[[&quot;A.&quot; &quot;英国、法国&quot;] [&quot;B.&quot; &quot;加拿大、墨西哥\n&quot;] [&quot;C.&quot; &quot;葡萄牙、加拿大&quot;] [&quot;D.&quot; &quot;墨西哥、德国\n&quot;]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Golang中实现正则表达式的不匹配？

问题

答案1

如何停止基于ticker重复运行的goRoutine？

如何在编译后的二进制文件中隐藏一个值

Golang静态绑定与动态绑定的对象

如何在同一个结构体中使用多个通道？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论