2023年6月30日 16:34:56go评论105阅读模式

英文:

Match any string also containing escaped characters and newlines with Go

问题

以下是翻译好的内容：

需要使用Go编写的工具来查找文件（C或C++代码）中的任何（格式）字符串，即使其中包含转义字符或换行符。示例：

...&quot;foo&quot;...
...`foo:&quot;foo&quot;`...
...&quot;foo
foo&quot;...
...&quot;foo\r\nfoo&quot;...
...`foo&quot;foo-

lish`

C/C++解析也可以在注释或停用的代码中进行，因此不需要排除这些部分。

我在 https://regex101.com/r/FDhldb/1 上使用了以下正则表达式成功找到了解决方案：

/(["'`])(?:(?=(\?))\2.)*?\1/gms

不幸的是，这个正则表达式在Go中无法编译：

const (
patFmtString = `(?Us)([&quot;&#39;])(?:(?=(\\?)).)*?`
)
var (
matchFmtString = regexp.MustCompile(patFmtString)
)

即使简化了模式 (?Us)(["'])(?:(\\?).)*?\1，也会出现“error parsing regexp: invalid escape sequence: \1”的错误。

我应该如何在Go中正确实现它，并希望运行速度也很快？

英文:

Any (format) strings in a file (C or C++ code), even containing escaped characters or newlines are needed to be found by a tool written in Go. Examples:

...&quot;foo&quot;...
...`foo:&quot;foo&quot;`...
...&quot;foo
foo&quot;...
...&quot;foo\r\nfoo&quot;...
...`foo&quot;foo-

lish`

The C/C++ parsing is allowed to be done also in comments or deactivated code, so no need to exclude that parts.

I succeeded with

/(["'`])(?:(?=(\?))\2.)*?\1/gms

on https://regex101.com/r/FDhldb/1 searching for a solution.

Unfortunately this does not compile in Go:

const (
patFmtString = `(?Us)([&quot;&#39;])(?:(?=(\\?)).)*?`
)
var (
matchFmtString = regexp.MustCompile(patFmtString)
)

Even the simplified pattern (?Us)(["'])(?:(\\?).)*?\1 delivers "error parsing regexp: invalid escape sequence: \1".

How do I correctly implement that in Go, hopefully running also fast?

答案1

得分: 1

你可以使用相对简单的Scanner来实现这个，而不是使用PCRE：

import "bufio"

var stringLiterals bufio.SplitFunc = func(data []byte, atEOF bool) (advance int, token []byte, err error) {
	scanning := false
	var delim byte
	var i int
	var start, end int
	for i < len(data) {
		b := data[i]
		switch b {
		case '\\': // 跳过转义序列
			i += 2
			continue
		case '"':
			fallthrough
		case '\'':
			fallthrough
		case '`':
			if scanning && delim == b {
				end = i + 1
				token = data[start:end]
				advance = end
				return
			} else if !scanning {
				scanning = true
				start = i
				delim = b
			}
		}
		i++
	}
	if atEOF {
		return len(data), nil, nil
	}
	return start, nil, nil
}

然后像这样使用它：

func main() {
    input := /* 一些读取器 */
    scanner := bufio.NewScanner(input)
    scanner.Split(stringLiterals)
    for scanner.Scan() {
        stringLit := scanner.Text()
        // 使用 `stringLit` 做一些操作
    }
}

对于你的示例，这将返回与你的正则表达式完全匹配的结果，尽管我不确定这是否实际上对应于C++字符串字面值的语法。

你可以在playground上尝试一下。

英文:

You can use a reasonably simple Scanner to accomplish this instead of using PCRE:

import &quot;bufio&quot;

var stringLiterals bufio.SplitFunc = func(data []byte, atEOF bool) (advance int, token []byte, err error) {
	scanning := false
	var delim byte
	var i int
	var start, end int
	for i &lt; len(data) {
		b := data[i]
		switch b {
		case &#39;\\&#39;: // skip escape sequences
			i += 2
			continue
		case &#39;&quot;&#39;:
			fallthrough
		case &#39;\&#39;&#39;:
			fallthrough
		case &#39;`&#39;:
			if scanning &amp;&amp; delim == b {
				end = i + 1
				token = data[start:end]
				advance = end
				return
			} else if !scanning {
				scanning = true
				start = i
				delim = b
			}
		}
		i++
	}
	if atEOF {
		return len(data), nil, nil
	}
	return start, nil, nil
}

and use it like

func main() {
    input := /* some reader */
    scanner := bufio.NewScanner(input)
    scanner.Split(stringLiterals)
    for scanner.Scan() {
        stringLit := scanner.Text()
        // do something with `stringLit`
    }
}

For you examples, this returns exactly the matches that your regex does, though I'm not sure that actually corresponds to the way C++ string literals are defined in the grammar.

You can try it out on the playground.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Match any string also containing escaped characters and newlines with Go

问题

答案1

Go正确的行为，还是编译器的错误？

使用Cobra和Viper在Go中配置默认目录路径

使用html.ParseFragment的通用方法

使用url.URL与controller-gen

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论