英文:
how can the syntactic analyzer ignore white space in the input
问题
在下面的代码中,尽管我将\t
添加为一个令牌,并将其优先级设置得比数字高,但当我测试它时,它接受-2- 2
(在-
后有一个空格)和-2 - 2
(在-
周围有2个空格),但不接受-2-2
(没有空格)。对于这种特殊情况,是否有解决方法?
我想要的是,当我输入类似-2-2-2
或2*3/6-1
这样的输入时,它能够正常工作,而不输出"语法错误"。
lexical.l
的代码部分已翻译,如下所示:
/* recognize tokens for the calculator and print them out */
/* "++[0-9]+ { yytext++; yylval = atoi(yytext); return NUMBER; }*/
%{
#include "syntax.tab.h"
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[ \t] { /* ignore whitespace */ }
(-|"")[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
. { printf("Mystery character %c\n", *yytext); }
%%
syntax.y
的代码部分也已翻译,如下所示:
/* simplest version of calculator */
%{
#include <stdio.h>
%}
/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token EOL
%%
calclist: /* nothing */
| calclist exp EOL { printf("= %d\n", $2); }
;
exp: factor
| exp ADD factor { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
;
factor: term
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
;
term: NUMBER
| ABS term { if ($2 < 0) $$ = -$2; else $$ = $2; }
;
%%
main(int argc, char **argv)
{
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}
希望这对你有所帮助。如果你有进一步的问题,请随时提出。
英文:
In the code down below, although I added \t
as a token, with a priority higher than the digits, still when I test it, it accepts -2- 2
(with a single whitespace after the -
) and -2 - 2
(with 2 whitespaces surrounding the -
) but it does not accept -2-2
(with no whitespaces). Is there a solution for this particular situation?
What I'm aiming for is that when i give it an input such as -2-2-2
or 2*3/6-1
it works fine and does not output 'syntax error'.
lexical.l
/* recognize tokens for the calculator and print them out */
/*"--"[0-9]+ { yytext++; yylval = atoi(yytext); return NUMBER; }*/
%{
#include"syntax.tab.h"
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[ \t] { /* ignore whitespace */ }
(-|"")[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
. { printf("Mystery character %c\n", *yytext); }
%%
syntax.y
/* simplest version of calculator */
%{
#include <stdio.h>
%}
/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token EOL
%%
calclist: /* nothing */
| calclist exp EOL { printf("= %d\n", $2); }
;
exp: factor
| exp ADD factor { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
;
factor: term
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
;
term: NUMBER
| ABS term { if ($2 < 0) $$ = -$2; else $$ = $2; }
;
%%
main(int argc, char **argv)
{
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}`
答案1
得分: 2
-2-2
被词法分析器解释为 -2
-2
,而不是 -2
-
2
。
词法分析器始终寻找最长匹配,因此它会更愿意将减号和数字视为单个标记。
您的解析器没有接受两个连续数字的规则,因此会显示错误。
(您应该学会在词法分析器和解析器中打开调试输出。在这种情况下,它非常有帮助。)
要解决这个问题,您需要在词法分析器中将 -
作为数字的一部分移除。
将其嵌入数字是一个常见的错误,会导致您遇到的问题。
相反,您可以在解析器中定义一元减号 -
运算符。
(顺便说一句,(-|"")
可以写成 -?
。)
英文:
-2-2
is interpreted by the lexer as -2
-2
, not as -2
-
2
.
Lexer always looks for the longest match so it will always prefer to treat a minus and a number as a single token.
Your parser doesn't have a rule that accept two consecutive numbers so it shows an error.
(You should learn to turn on debug output in the lexer and the parser. It is very helpful in such cases.)
To fix the problem, you need to remove -
as part of a number in the lexer.
Having it baked into a number is very common mistake that leads to exactly such problems as you encountered.
Instead, you can define unary -
operator in the parser.
(Btw., (-|"")
can be written as -?
.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论