英文:
Selector in PEG.js grammar accepting what is shouldn't
问题
最近,我一直在使用PEG.js创建自定义编程语言。我创建了一个系统,用于识别变量名称并评估变量值,支持访问对象/数组属性。
全局变量(glob
):
{
"null":null,
"undefined":undefined,
test:{
foobar:'it worked'
}
}
当我输入 test
时,它如预期地评估为 {foobar:"it worked"}
。但是,当我输入 test["foobar"]
时,它应该返回 "it worked"
,但实际上我得到了以下错误:
错误:变量 'test[' 不存在。
我的PEG.js语法:
Getvar
= name:Varname path:('[' _ exp:(String/Integer) _ ']' {return exp;})* {
let rt=glob[name];
if(rt===undefined&&name!='undefined'&&name!='null')
error(`变量 '${name}' 不存在。`);
for(let p of path)rt=rt;
return rt;
}
Varname "变量名"
= [A-z0-9]+{
if(!/[A-z]+/.test(text()))
error(`变量名必须包含至少一个字母。(读取 '${text()}')`);
return text();
}
String "字符串"
= '"' chars:DoubleStringCharacter* '"' { return chars.join(''); }
/ "'" chars:SingleStringCharacter* "'" { return chars.join(''); }
DoubleStringCharacter
= !('"' / '\\"') char:. { return char; }
/ '\\"' sequence:EscapeSequence { return sequence; }
SingleStringCharacter
= !("'" / "\\'") char:. { return char; }
/ "\\'" sequence:EscapeSequence { return sequence; }
EscapeSequence
= "'" { return "\'"; }
/ '"' { return '\"'; }
/ '\\"'
/ 'b' { return "\b"; }
/ 'f' { return "\f"; }
/ 'n' { return "\n"; }
/ 'r' { return "\r"; }
/ 't' { return "\t"; }
/ 'v' { return "\x0B"; }
Integer "整数"
= _ [0-9]+ { return parseInt(text(), 10); }
_ "空白"
= [ \t\n\r]*
由于变量名具有 [A-z0-9]+
的模式,我不知道为什么 [
被识别为变量名。在尝试弄清楚发生了什么的过程中,我发现该模式不仅匹配了字母 A-z
,数字 0-9
,还匹配了 [
和 ]
。
有人知道为什么会发生这种情况吗?
英文:
I've recently been working on a custom programming language using PEG.js.
I made a system that recognises variable names and evaluates variable values, supporting access to object/array properties.
Global variables (glob
):
{
"null":null,
"undefined":undefined,
test:{
foobar:'it worked'
}
}
When I type test
, it evaluates to {foobar:"it worked"}
as expected.
But when I type test["foobar"]
it should return "it worked"
but instead I get this error:
Error: Variable 'test[' does not exist.
My PEG.js grammar:
Getvar
= name:Varname path:('[' _ exp:(String/Integer) _ ']' {return exp;})* {
let rt=glob[name];
if(rt===undefined&&name!='undefined'&&name!='null')
error(`Variable '${name}' does not exist.`);
for(let p of path)rt=rt;
return rt;
}
Varname "variable name"
= [A-z0-9]+{
if(!/[A-z]+/.test(text()))
error(`Variable name must contain at least one letter. (reading '${text()}')`);
return text();
}
String "string"
= '"' chars:DoubleStringCharacter* '"' { return chars.join(''); }
/ "'" chars:SingleStringCharacter* "'" { return chars.join(''); }
DoubleStringCharacter
= !('"' / "\\") char:. { return char; }
/ "\\" sequence:EscapeSequence { return sequence; }
SingleStringCharacter
= !("'" / "\\") char:. { return char; }
/ "\\" sequence:EscapeSequence { return sequence; }
EscapeSequence
= "'"
/ '"'
/ "\\"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "v" { return "\x0B"; }
Integer "integer"
= _ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
Since the variable name has the pattern [A-z0-9]+
, I have no idea why [
passes as a variable name. As I was playing around trying to figure out what's going on,
I discovered that the pattern somehow matches A-z
(letters), 0-9
(numbers), but also [
and ]
.
Does anyone know why this is happening?
答案1
得分: 0
我不完全理解为什么生成的解析器在像变量名这样的量化规则上的工作方式如此,然而,如果我用以下内容替换你的 Varname
:
Varname "variable name"
= vfirst: Vstart vrest: Vtail* {
let rv = vfirst + vrest.join("");
return rv;
}
Vstart = sc: [A-Z]i { return sc; }
Vtail = tc: [A-Z0-9]i { return tc; }
那么它将按预期工作。哦,另外,在开始规则的末尾,我添加了 _
。
再次强调,我不知道为什么这样会起作用,但文档中提到量词不会回溯。我的直觉(但不是基于知识的反应)是将其视为有点问题。在我上面所做的更改中,量词位于规则上而不是模式上。每个模式只能匹配一个字符,或者什么都不匹配。
编辑 — 保留上述内容以供历史参考,但真正的问题在于模式。在模式中,A-z
会匹配所有字符,从 "Z" 到 "a"。[A-Z0-9]i
会更好地工作。请注意,这个问题是由提问者解决的,而不是我。
英文:
I don't fully understand why the generated parser works the way it does with quantified rules like for the variable name. However, if I replace your Varname
with this:
Varname "variable name"
= vfirst: Vstart vrest: Vtail* {
let rv = vfirst + vrest.join("");
return rv;
}
Vstart = sc: [A-Z]i { return sc; }
Vtail = tc: [A-Z0-9]i { return tc; }
then it works as expected. Oh, plus at the end of the start rule I added _
.
Again, I don't know why this works, but there's mention in the docs that the quantifiers do not backtrack. My instinctive (but not informed) reaction is to consider that kind-of broken. In the change I made above, the quantifier is on the rule instead of the pattern. Each pattern can only match one character, or nothing.
<hr>
edit — leaving the above for historical interest, but the real problem was the pattern. In a pattern, A-z
picks up all the characters between "Z" and "a". [A-Z0-9]i
will work better. (Note that the OP figured this out, not me.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论