英文:
Participle is stating Unexpected Token
问题
我正在玩一个分词器来学习解析,但我无法确定为什么会出现这种意外情况。
// nolint: golint, dupl
package main
import (
"fmt"
"io"
"github.com/alecthomas/participle/v2"
"github.com/alecthomas/participle/v2/lexer"
)
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"Comment", `^#[^\n]*`},
{"Ident", `^\w+`},
{"Int", `\d+`},
{"String", `("(\\\"|[^"])*"|\S+)`},
{"EOL", `[\n\r]+`},
{"whitespace", `[ \t]+`},
})
type HTACCESS struct {
Directives []*Directive `@@*`
}
type Directive struct {
Pos lexer.Position
ErrorDocument *ErrorDocument `@@`
}
type ErrorDocument struct {
Code int `"ErrorDocument" @Int`
Path string `@String`
}
var htaccessParser = participle.MustBuild(&HTACCESS{},
participle.Lexer(htaccessLexer),
participle.CaseInsensitive("Ident"),
participle.Unquote("String"),
participle.Elide("whitespace"),
)
func Parse(r io.Reader) (*HTACCESS, error) {
program := &HTACCESS{}
err := htaccessParser.Parse("", r, program)
if err != nil {
return nil, err
}
return program, nil
}
func main() {
v, err := htaccessParser.ParseString("", `ErrorDocument 403 test`)
if err != nil {
panic(err)
}
fmt.Println(v)
}
据我所知,这似乎是正确的,我期望403在那里,但我不确定为什么它无法识别它。
编辑:
我将我的分词器更改为以下内容:
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"dir", `^\w+`},
{"int", `\d+`},
{"str", `("(\\\"|[^"])*"|\S+)`},
{"EOL", `[\n\r]+`},
{"whitespace", `\s+`},
})
错误消失了,但它仍然打印一个空数组,不确定为什么。我也不确定为什么使用不同的分词器值会修复它。
英文:
I am playing with a participle to learn how to parse and I cannot determine why this is unexpected.
// nolint: golint, dupl
package main
import (
"fmt"
"io"
"github.com/alecthomas/participle/v2"
"github.com/alecthomas/participle/v2/lexer"
)
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"Comment", `^#[^\n]*`},
{"Ident", `^\w+`},
{"Int", `\d+`},
{"String", `("(\\"|[^"])*"|\S+)`},
{"EOL", `[\n\r]+`},
{"whitespace", `[ \t]+`},
})
type HTACCESS struct {
Directives []*Directive `@@*`
}
type Directive struct {
Pos lexer.Position
ErrorDocument *ErrorDocument `@@`
}
type ErrorDocument struct {
Code int `"ErrorDocument" @Int`
Path string `@String`
}
var htaccessParser = participle.MustBuild[HTACCESS](
participle.Lexer(htaccessLexer),
participle.CaseInsensitive("Ident"),
participle.Unquote("String"),
participle.Elide("whitespace"),
)
func Parse(r io.Reader) (*HTACCESS, error) {
program, err := htaccessParser.Parse("", r)
if err != nil {
return nil, err
}
return program, nil
}
func main() {
v, err := htaccessParser.ParseString("", `ErrorDocument 403 test`)
if err != nil {
panic(err)
}
fmt.Println(v)
}
From what I can tell, this seems to be correct, I expect 403 to be there, but I am not sure why it doesn't recognize it.
Edit:
I changed my lexer to this:
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"dir", `^\w+`},
{"int", `\d+`},
{"str", `("(\\"|[^"])*"|\S+)`},
{"EOL", `[\n\r]+`},
{"whitespace", `\s+`},
})
And the error is gone, but it is still printing an empty array, not sure why. I am also unsure why using different values for the lexer fixes it either.
答案1
得分: 2
我相信我找到了问题所在,问题出在顺序上。Ident通过\w标签在我的词法分析器中找到了数字,这导致我的整数被标记为ident。
我发现我必须将QuotedStrings和UnQuotedStrings分开,否则未引用的字符串会捕捉到整数。或者我可以确保它只捕捉非数字的值,但这样会错过像stringwithnum2
这样的内容。
这是我的解决方案:
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"Comment", `(?i)#[^\n]*`},
{"QuotedString", `"(\\"|[^"])*"`},
{"Number", `[-+]?(\\d*\\.)?\\d+`},
{"UnQuotedString", `[^ \t]+`},
{"Ident", `^[a-zA-Z_]`},
{"EOL", `[\n\r]+`},
{"whitespace", `[ \t]+`},
})
type ErrorDocument struct {
Pos lexer.Position
Code int `"ErrorDocument" @Number`
Path string `(@QuotedString | @UnQuotedString)`
}
这解决了我的问题,因为它现在先找到引用的字符串,然后查找数字,然后查找未引用的字符串。
英文:
I believe I found the issue, it is the order, Ident was finding numbers in my lexer via the \w tag, so this caused my integers to be marked as ident.
I found that I have to separate QuotedStrings and UnQuotedStrings otherwise unquoted strings was picking up integers. Alternatively I could ensure it only picks up non-numeric values, but that would miss things like stringwithnum2
Here is my solution
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{
{"Comment", `(?i)#[^\n]*`},
{"QuotedString", `"(\\"|[^"])*"`},
{"Number", `[-+]?(\d*\.)?\d+`},
{"UnQuotedString", `[^ \t]+`},
{"Ident", `^[a-zA-Z_]`},
{"EOL", `[\n\r]+`},
{"whitespace", `[ \t]+`},
})
type ErrorDocument struct {
Pos lexer.Position
Code int `"ErrorDocument" @Number`
Path string `(@QuotedString | @UnQuotedString)`
}
This fixed my issue, because it now finds quoted strings, then looks for Numbers, then looks for unquoted strings.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论