2023年2月19日 04:33:58go评论123阅读模式

英文:

Participle is stating Unexpected Token

问题

我正在玩一个分词器来学习解析，但我无法确定为什么会出现这种意外情况。

// nolint: golint, dupl
package main
import (
	"fmt"
	"io"
	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/participle/v2/lexer"
)
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
	{"Comment", `^#[^\n]*`},
	{"Ident", `^\w+`},
	{"Int", `\d+`},
	{"String", `("(\\\"|[^"])*"|\S+)`},
	{"EOL", `[\n\r]+`},
	{"whitespace", `[ \t]+`},
})
type HTACCESS struct {
	Directives []*Directive `@@*`
}
type Directive struct {
	Pos lexer.Position
	ErrorDocument *ErrorDocument `@@`
}
type ErrorDocument struct {
	Code int    `"ErrorDocument" @Int`
	Path string `@String`
}
var htaccessParser = participle.MustBuild(&HTACCESS{},
	participle.Lexer(htaccessLexer),
	participle.CaseInsensitive("Ident"),
	participle.Unquote("String"),
	participle.Elide("whitespace"),
)
func Parse(r io.Reader) (*HTACCESS, error) {
	program := &HTACCESS{}
	err := htaccessParser.Parse("", r, program)
	if err != nil {
		return nil, err
	}
	return program, nil
}
func main() {
	v, err := htaccessParser.ParseString("", `ErrorDocument 403 test`)
	if err != nil {
		panic(err)
	}
	fmt.Println(v)
}

据我所知，这似乎是正确的，我期望403在那里，但我不确定为什么它无法识别它。

编辑：
我将我的分词器更改为以下内容：

var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
	{"dir", `^\w+`},
	{"int", `\d+`},
	{"str", `("(\\\"|[^"])*"|\S+)`},
	{"EOL", `[\n\r]+`},
	{"whitespace", `\s+`},
})

错误消失了，但它仍然打印一个空数组，不确定为什么。我也不确定为什么使用不同的分词器值会修复它。

英文:

I am playing with a participle to learn how to parse and I cannot determine why this is unexpected.

// nolint: golint, dupl
package main
import (
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;github.com/alecthomas/participle/v2&quot;
	&quot;github.com/alecthomas/participle/v2/lexer&quot;
)
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
	{&quot;Comment&quot;, `^#[^\n]*`},
	{&quot;Ident&quot;, `^\w+`},
	{&quot;Int&quot;, `\d+`},
	{&quot;String&quot;, `(&quot;(\\&quot;|[^&quot;])*&quot;|\S+)`},
	{&quot;EOL&quot;, `[\n\r]+`},
	{&quot;whitespace&quot;, `[ \t]+`},
})
type HTACCESS struct {
	Directives []*Directive `@@*`
}
type Directive struct {
	Pos lexer.Position
	ErrorDocument *ErrorDocument `@@`
}
type ErrorDocument struct {
	Code int    `&quot;ErrorDocument&quot; @Int`
	Path string `@String`
}
var htaccessParser = participle.MustBuild[HTACCESS](
	participle.Lexer(htaccessLexer),
	participle.CaseInsensitive(&quot;Ident&quot;),
	participle.Unquote(&quot;String&quot;),
	participle.Elide(&quot;whitespace&quot;),
)
func Parse(r io.Reader) (*HTACCESS, error) {
	program, err := htaccessParser.Parse(&quot;&quot;, r)
	if err != nil {
		return nil, err
	}
	return program, nil
}
func main() {
	v, err := htaccessParser.ParseString(&quot;&quot;, `ErrorDocument 403 test`)
	if err != nil {
		panic(err)
	}
	fmt.Println(v)
}

From what I can tell, this seems to be correct, I expect 403 to be there, but I am not sure why it doesn't recognize it.

Edit:
I changed my lexer to this:

var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{&quot;dir&quot;, `^\w+`},
{&quot;int&quot;, `\d+`},
{&quot;str&quot;, `(&quot;(\\&quot;|[^&quot;])*&quot;|\S+)`},
{&quot;EOL&quot;, `[\n\r]+`},
{&quot;whitespace&quot;, `\s+`},
})

And the error is gone, but it is still printing an empty array, not sure why. I am also unsure why using different values for the lexer fixes it either.

答案1

得分: 2

我相信我找到了问题所在，问题出在顺序上。Ident通过\w标签在我的词法分析器中找到了数字，这导致我的整数被标记为ident。

我发现我必须将QuotedStrings和UnQuotedStrings分开，否则未引用的字符串会捕捉到整数。或者我可以确保它只捕捉非数字的值，但这样会错过像stringwithnum2这样的内容。

这是我的解决方案：

var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
	{"Comment", `(?i)#[^\n]*`},
	{"QuotedString", `"(\\"|[^"])*"`},
	{"Number", `[-+]?(\\d*\\.)?\\d+`},
	{"UnQuotedString", `[^ \t]+`},
	{"Ident", `^[a-zA-Z_]`},
	{"EOL", `[\n\r]+`},
	{"whitespace", `[ \t]+`},
})

type ErrorDocument struct {
	Pos lexer.Position
	Code int    `"ErrorDocument" @Number`
	Path string `(@QuotedString | @UnQuotedString)`
}

这解决了我的问题，因为它现在先找到引用的字符串，然后查找数字，然后查找未引用的字符串。

英文:

I believe I found the issue, it is the order, Ident was finding numbers in my lexer via the \w tag, so this caused my integers to be marked as ident.

I found that I have to separate QuotedStrings and UnQuotedStrings otherwise unquoted strings was picking up integers. Alternatively I could ensure it only picks up non-numeric values, but that would miss things like stringwithnum2

Here is my solution

var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{&quot;Comment&quot;, `(?i)#[^\n]*`},
{&quot;QuotedString&quot;, `&quot;(\\&quot;|[^&quot;])*&quot;`},
{&quot;Number&quot;, `[-+]?(\d*\.)?\d+`},
{&quot;UnQuotedString&quot;, `[^ \t]+`},
{&quot;Ident&quot;, `^[a-zA-Z_]`},
{&quot;EOL&quot;, `[\n\r]+`},
{&quot;whitespace&quot;, `[ \t]+`},
})

type ErrorDocument struct {
Pos lexer.Position
Code int    `&quot;ErrorDocument&quot; @Number`
Path string `(@QuotedString | @UnQuotedString)`
}

This fixed my issue, because it now finds quoted strings, then looks for Numbers, then looks for unquoted strings.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

分词正在报告意外的标记。

问题

答案1

使用类型断言在Golang中检测超出范围的值错误。

从字符串格式的x509证书中生成主题名称

GoLang – 跟随重定向处理带有请求体数据的 POST 请求

how do i open file with application like emacs,vim, TextMate in golang?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。