2021年8月5日 09:51:36go评论167阅读模式

英文:

unsupported Perl syntax: `(?<`

问题

我想解析cmd命令"gpg --list-keys"的结果，并在浏览器上显示它。

cmd输出的格式如下：

    pub   rsa3072 2021-08-03 [SC] [expires: 2023-08-03]
          07C47E284765D5593171C18F00B11D51A071CB55
    uid           [ultimate] user1 &lt;user1@example.com&gt;
    sub   rsa3072 2021-08-03 [E] [expires: 2023-08-03]
    
    pub   rsa3072 2021-08-04 [SC]
          37709ABD4D96324AB8CBFC3B441812AFBCE7A013
    uid           [ultimate] user2 &lt;user2@example.com&gt;
    sub   rsa3072 2021-08-04 [E]

我期望的结果是：

    {
    	{uid : user1@example.com},
    	{uid : user2@example.com},
    }

以下是代码：

    type GPGList struct{
    	uid string
    }
    
    //find list keys
    func Findlistkeys(){
    	pathexec, _ := exec.LookPath("gpg")
    	cmd := exec.Command(pathexec, "--list-keys")
    	cmdOutput := &bytes.Buffer{}
        cmd.Stdout = cmdOutput
        printCommand(cmd)
        err := cmd.Run()
        printError(err)
        output := cmdOutput.Bytes()
        printOutput(output)
        GPG := GPGList{}
        parseOutput(output, &GPG)
        fmt.Println(GPG)
    }
    
    func printCommand(cmd *exec.Cmd) {
    	fmt.Printf("==&gt; Executing: %s\n", strings.Join(cmd.Args, " "))
    }
    
    func printError(err error) {
    	if err != nil {
    			os.Stderr.WriteString(fmt.Sprintf("==&gt; Error: %s\n", err.Error()))
    	}
    }
    
    func printOutput(outs []byte) {
    	if len(outs) &gt; 0 {
    			fmt.Printf("==&gt; Output: %s\n", string(outs))
    	}
    }
    
    func parseOutput(outs []byte, GPG *GPGList) {
    	var uid = regexp.MustCompile(`(?&lt;=\&lt;)(.*?)(?=\&gt;)`)
    	fmt.Println(uid)
    }

代码以以下错误信息结束：

    panic: regexp: Compile(`(?&lt;=\&lt;)(.*?)(?=\&gt;)`): error parsing regexp: invalid or unsupported Perl syntax: `(?&lt;

到目前为止，我在正则表达式上遇到了问题。我不明白为什么它无法编译...
它有什么问题吗？

我在在线模拟器上测试了正则表达式，看起来没问题，但是在这里有些问题。
请给予建议，谢谢！

英文:

I want to parse the result of the cmd 'gpg --list-keys' to display it on the browser.
The cmd ouput is like this:


    pub   rsa3072 2021-08-03 [SC] [expires: 2023-08-03]
          07C47E284765D5593171C18F00B11D51A071CB55
    uid           [ultimate] user1 &lt;user1@example.com&gt;
    sub   rsa3072 2021-08-03 [E] [expires: 2023-08-03]
    
    pub   rsa3072 2021-08-04 [SC]
          37709ABD4D96324AB8CBFC3B441812AFBCE7A013
    uid           [ultimate] user2 &lt;user2@example.com&gt;
    sub   rsa3072 2021-08-04 [E]

I expect something like this :


    {
    	{uid : user1@example.com},
    	{uid : user2@example.com},
    }

Here is the code:

    type GPGList struct{
    	uid string
    }
    
    //find list keys
    func Findlistkeys(){
    	pathexec, _ := exec.LookPath(&quot;gpg&quot;)
    	cmd := exec.Command(pathexec, &quot;--list-keys&quot;)
    	cmdOutput := &amp;bytes.Buffer{}
        cmd.Stdout = cmdOutput
        printCommand(cmd)
        err := cmd.Run()
        printError(err)
        output := cmdOutput.Bytes()
        printOutput(output)
        GPG := GPGList{}
        parseOutput(output, &amp;GPG)
        fmt.Println(GPG)
    }
    
    func printCommand(cmd *exec.Cmd) {
    	fmt.Printf(&quot;==&gt; Executing: %s\n&quot;, strings.Join(cmd.Args, &quot; &quot;))
    }
    
    func printError(err error) {
    	if err != nil {
    			os.Stderr.WriteString(fmt.Sprintf(&quot;==&gt; Error: %s\n&quot;, err.Error()))
    	}
    }
    
    func printOutput(outs []byte) {
    	if len(outs) &gt; 0 {
    			fmt.Printf(&quot;==&gt; Output: %s\n&quot;, string(outs))
    	}
    }
    
    func parseOutput(outs []byte, GPG *GPGList) {
    	var uid = regexp.MustCompile(`(?&lt;=\&lt;)(.*?)(?=\&gt;)`)
    	fmt.Println(uid)
    }

It ends with the following message :

    panic: regexp: Compile(`(?&lt;=\&lt;)(.*?)(?=\&gt;)`): error parsing regexp: invalid or unsupported Perl syntax: `(?&lt;

So far I'm stack with the regex.
It don't understand why it don't want to compile...
What is wrong with it?

I've tested the regex on online simulator and it looks OK, yet there is something wrong with it.
Any suggestion please?

答案1

得分: 3

regexp 包使用 RE2 接受的语法。来源：https://github.com/google/re2/wiki/Syntax

> (?<=re) 在匹配 re 的文本之后（不支持）

因此出现了错误信息：

> error parsing regexp: invalid or unsupported Perl syntax: (?<

在线模拟器可能正在测试不同的正则表达式语法。你需要找到另一种正则表达式编码或者使用不同的正则表达式包。

你可以尝试的另一种编码是 \<([^\>]*)\>（playground）。这种编码相当简单，可能与你最初的意图不符。

英文:

The regexp package uses the syntax accepted by RE2. From https://github.com/google/re2/wiki/Syntax

> (?<=re) after text matching re (NOT SUPPORTED)

Hence the error message:

> error parsing regexp: invalid or unsupported Perl syntax: (?<

The online simulator is likely testing a different regular expression syntax. You will need to find an alternative regular expression encoding or a different regular expression package.

An alternative encoding you can try is \<([^\>]*)\> (playground). This is quite simple and may not match your original intent.

答案2

得分: 1

这是另一种基于gpg --list-keys --with-colons机器可读输出的解决方案。

这仍然是一个慢速的解决方案，但易于编写、易于更新，不使用正则表达式。

一个聪明的人可以提出一个更快的解决方案，而不需要添加复杂的代码。只需循环遍历字符串，直到遇到<，然后捕获到>之间的字符串。

这是基于简单的CSV读取器，因此您可以将其插入到命令的输出流中，或者其他任何地方。

它的一个重要优点是它不需要将整个数据缓冲到内存中，可以进行流式解码。

package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"regexp"
	"strings"
)

func main() {
	fmt.Printf("%#v\n", extractEmailsCSV(csvInput))
}

var uid = regexp.MustCompile(`<([^>]+)>`)

func extractEmailsRegexp(input string) (out []string) {
	submatchall := uid.FindAllString(input, -1)
	for _, element := range submatchall {
		element = strings.Trim(element, "<")
		element = strings.Trim(element, ">")
		out = append(out, element)
	}
	return
}

func extractEmailsCSV(input string) (out []string) {
	r := strings.NewReader(input)
	csv := csv.NewReader(r)
	csv.Comma = ':'
	csv.ReuseRecord = true
	csv.FieldsPerRecord = -1

	for {
		records, err := csv.Read()
		if err == io.EOF {
			break
		} else if err != nil {
			panic(err)
		}

		if len(records) < 10 {
			continue
		}

		r := records[9]
		if strings.Contains(r, "@") {
			begin := strings.Index(r, "<")
			end := strings.Index(r, ">")
			if begin+end > 0 {
				out = append(out, r[begin+1:end])
			}
		}
	}
	return
}

var regexpInput = `
    pub   rsa3072 2021-08-03 [SC] [expires: 2023-08-03]
          07C47E284765D5593171C18F00B11D51A071CB55
    uid           [ultimate] user1 <user1@example.com>
    sub   rsa3072 2021-08-03 [E] [expires: 2023-08-03]

    pub   rsa3072 2021-08-04 [SC]
          37709ABD4D96324AB8CBFC3B441812AFBCE7A013
    uid           [ultimate] user2 <user2@example.com>
    sub   rsa3072 2021-08-04 [E]
`

var csvInput = `pub:u:1024:17:51FF9A17136C5B87:1999-04-24::59:-:Tony Nelson <tnelson@techie.com>:
uid:u::::::::Tony Nelson <tnelson@conceptech.com>:
`

我们没有完全相同的基准设置，但无论如何。如果您认为它使比较变得臃肿，请随时提供更好的基准设置。

这是基准设置：

package main

import (
	"strings"
	"testing"
)

func BenchmarkCSV_1(b *testing.B) {
	input := strings.Repeat(csvInput, 1)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_1(b *testing.B) {
	input := strings.Repeat(regexpInput, 1)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

func BenchmarkCSV_10(b *testing.B) {
	input := strings.Repeat(csvInput, 10)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_10(b *testing.B) {
	input := strings.Repeat(regexpInput, 10)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

func BenchmarkCSV_100(b *testing.B) {
	input := strings.Repeat(csvInput, 100)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_100(b *testing.B) {
	input := strings.Repeat(regexpInput, 100)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

这是结果：

BenchmarkCSV_1
BenchmarkCSV_1-4        	  242736	      4200 ns/op	    5072 B/op	      18 allocs/op
BenchmarkRegExp_1
BenchmarkRegExp_1-4     	  252232	      4466 ns/op	     400 B/op	       9 allocs/op
BenchmarkCSV_10
BenchmarkCSV_10-4       	   68257	     17335 ns/op	    7184 B/op	      40 allocs/op
BenchmarkRegExp_10
BenchmarkRegExp_10-4    	   29871	     39947 ns/op	    3414 B/op	      68 allocs/op
BenchmarkCSV_100
BenchmarkCSV_100-4      	    7538	    141609 ns/op	   25872 B/op	     223 allocs/op
BenchmarkRegExp_100
BenchmarkRegExp_100-4   	    1726	    674718 ns/op	   37858 B/op	     615 allocs/op

就原始速度和分配而言，对于小数据集，正则表达式更好，尽管一旦有一点数据，正则表达式就会变慢，并且分配的内存更多。

阅读也可以参考https://pkg.go.dev/testing

我的结论是，不要使用正则表达式...此外，优化正则表达式很难，几乎不可能，而优化解析某些文本输入的算法是可行的，甚至是容易的。

总结一下，即使是最快和最好的运行时，如果没有经过深思熟虑的程序员来驱动它，也无济于事。

英文:

Here is another solution based on gpg --list-keys --with-colons machine readable output.

It is still a slow solution, but easy to write, easy to update, does not use regular expressions.

A smart folk can come with an even faster solution without adding a crazy wall of complexity. (just loop over the string until < then capture the string until >)

this is based on a simple csv reader, so you can plug it onto the output stream of a command.Exec instance, or whatever else.

The big advantage is that it does not need to buffer the whole data in memory, it can stream decode.

package main

import (
	&quot;encoding/csv&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;regexp&quot;
	&quot;strings&quot;
)

func main() {
	fmt.Printf(&quot;%#v\n&quot;, extractEmailsCSV(csvInput))
}

var uid = regexp.MustCompile(`\&lt;(.*?)\&gt;`)

func extractEmailsRegexp(input string) (out []string) {
	submatchall := uid.FindAllString(input, -1)
	for _, element := range submatchall {
		element = strings.Trim(element, &quot;&lt;&quot;)
		element = strings.Trim(element, &quot;&gt;&quot;)
		out = append(out, element)
	}
	return
}

func extractEmailsCSV(input string) (out []string) {
	r := strings.NewReader(input)
	csv := csv.NewReader(r)
	csv.Comma = &#39;:&#39;
	csv.ReuseRecord = true
	csv.FieldsPerRecord = -1

	for {
		records, err := csv.Read()
		if err == io.EOF {
			break
		} else if err != nil {
			panic(err)
		}

		if len(records) &lt; 10 {
			continue
		}

		r := records[9]
		if strings.Contains(r, &quot;@&quot;) {
			begin := strings.Index(r, &quot;&lt;&quot;)
			end := strings.Index(r, &quot;&gt;&quot;)
			if begin+end &gt; 0 {
				out = append(out, r[begin+1:end])
			}
		}
	}
	return
}

var regexpInput = `
    pub   rsa3072 2021-08-03 [SC] [expires: 2023-08-03]
          07C47E284765D5593171C18F00B11D51A071CB55
    uid           [ultimate] user1 &lt;user1@example.com&gt;
    sub   rsa3072 2021-08-03 [E] [expires: 2023-08-03]

    pub   rsa3072 2021-08-04 [SC]
          37709ABD4D96324AB8CBFC3B441812AFBCE7A013
    uid           [ultimate] user2 &lt;user2@example.com&gt;
    sub   rsa3072 2021-08-04 [E]
`

var csvInput = `pub:u:1024:17:51FF9A17136C5B87:1999-04-24::59:-:Tony Nelson &lt;tnelson@techie.com&gt;:
uid:u::::::::Tony Nelson &lt;tnelson@conceptech.com&gt;:
`

We dont exactly have the same benchmark setup, but anyways. If you think it bloats the comparison feel free to provide better bench setup.

Here is the benchmark setup

package main

import (
	&quot;strings&quot;
	&quot;testing&quot;
)

func BenchmarkCSV_1(b *testing.B) {
	input := strings.Repeat(csvInput, 1)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_1(b *testing.B) {
	input := strings.Repeat(regexpInput, 1)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

func BenchmarkCSV_10(b *testing.B) {
	input := strings.Repeat(csvInput, 10)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_10(b *testing.B) {
	input := strings.Repeat(regexpInput, 10)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

func BenchmarkCSV_100(b *testing.B) {
	input := strings.Repeat(csvInput, 100)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsCSV(input)
	}
}
func BenchmarkRegExp_100(b *testing.B) {
	input := strings.Repeat(regexpInput, 100)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		_ = extractEmailsRegexp(input)
	}
}

And here is the result

BenchmarkCSV_1
BenchmarkCSV_1-4        	  242736	      4200 ns/op	    5072 B/op	      18 allocs/op
BenchmarkRegExp_1
BenchmarkRegExp_1-4     	  252232	      4466 ns/op	     400 B/op	       9 allocs/op
BenchmarkCSV_10
BenchmarkCSV_10-4       	   68257	     17335 ns/op	    7184 B/op	      40 allocs/op
BenchmarkRegExp_10
BenchmarkRegExp_10-4    	   29871	     39947 ns/op	    3414 B/op	      68 allocs/op
BenchmarkCSV_100
BenchmarkCSV_100-4      	    7538	    141609 ns/op	   25872 B/op	     223 allocs/op
BenchmarkRegExp_100
BenchmarkRegExp_100-4   	    1726	    674718 ns/op	   37858 B/op	     615 allocs/op

In terms of raw speed and allocations regular expression is better on small dataset, though as soon there is a little bit of data regular expressions are slower and allocates mores by a significant factor.

答案3

得分: 0

所以我更新了正则表达式...但是因为(?<=<)(.*?)(?=>)在在线模拟器上工作正常，我真的很惊讶。
为什么正则表达式不能在所有语言中都起作用呢...

func parseOutput(outs []byte, GPG *GPGList) {
    var uid = regexp.MustCompile(`<([^>]*)>`)
    submatchall := uid.FindAllString(string(outs), -1)
    for _, element := range submatchall {
        element = strings.Trim(element, "<")
        element = strings.Trim(element, ">")
        fmt.Println(element)
    }
}

英文:

So I updated the regex...but since (?<=\<)(.*?)(?=\>) was working on online simulator, I really got surprised.
Why can't regex work the same with all languages...

    func parseOutput(outs []byte, GPG *GPGList) {
var uid = regexp.MustCompile(`\&lt;(.*?)\&gt;`)
submatchall := uid.FindAllString(string(outs), -1)
for _, element := range submatchall {
element = strings.Trim(element, &quot;&lt;&quot;)
element = strings.Trim(element, &quot;&gt;&quot;)
fmt.Println(element)
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

不支持的 Perl 语法：`(?<`

问题

答案1

答案2

答案3

Golang 报错：接口转换错误：接口 {} 是 nil，而不是字符串。

在Go中提供Samba文件的矛盾性能

Go中使用分割包的protobuf和grpc

从传递给函数的结构体中获取名称。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论