英文:
Syntax problem with Bison and Flex during compilation and execution
问题
I've translated the code parts as you requested. Here they are:
billet.l:
%{
#include "billet.tab.h"
void yyerror(const char *s);
%}
DIGIT [0-9]
ALPHA [A-Za-z]
SEP [ \t]
%%
"DOSSIER" { return DOSSIER; }
{ALPHA}{6} { return CODE_DOSSIER; }
{ALPHA}{3}"/" { yylval.sval = strdup(yytext); return CODE_AEROPORT; }
{ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)? { yylval.sval = strdup(yytext); return NOM_PRENOM; }
{DIGIT}{2}"/"{DIGIT}{2}"/"{DIGIT}{2} { return DATE; }
{ALPHA}{2}{DIGIT}{2,4} { return NUM_VOL; }
{ALPHA}{3} { return CODE_AEROPORT; }
{DIGIT}{2}":"{DIGIT}{2} { yylval.sval = strdup(yytext); return HEURE_OR_DUREE_VOL; }
"+" { return PLUS; }
{SEP}+ { }
\n { return NEWLINE; }
. { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }
%%
billet.y:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}
%union {
char *sval;
}
%token DOSSIER CODE_DOSSIER NEWLINE PLUS
%token <sval> DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL
%token <sval> NOM_PRENOM
%type <sval> nom_prenom
%type <sval> heure_arrivee
%type <sval> heure_avec_plus
%%
billet: DOSSIER CODE_DOSSIER NEWLINE infos_passager NEWLINE vols;
infos_passager: nom_prenom '/' nom_prenom NEWLINE { printf("Infos passager : %s / %s\n", $1, $3); };
vols: vol NEWLINE vols | vol NEWLINE;
vol: DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL CODE_AEROPORT heure_arrivee HEURE_OR_DUREE_VOL { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); };
heure_arrivee: heure_avec_plus | HEURE_OR_DUREE_VOL;
heure_avec_plus: PLUS HEURE_OR_DUREE_VOL { $$ = $2; };
nom_prenom: NOM_PRENOM | NOM_PRENOM '/' NOM_PRENOM;
%%
int main() {
yyparse();
return 0;
}
void yyerror(const char *s) {
fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}
I hope this helps. If you encounter further issues, please let me know.
英文:
I'm currently working on a project that consists in parsing the content of a text file representing a plane ticket using bison
and flex
. I have created two files, ticket.y and ticket.l, to define grammar rules and corresponding regular expressions.
The example file I want to analyze is the following (ExampleAirplaneTicket.txt
):
DOSSIER YBNUKR
ANTOINE/DESAINT-EXUPERY
22/01/16 OS412 CDG 10:00 VIE 12:00 2:00
22/01/16 OS051 VIE 13:20 NRT +07:25 11:05
23/01/16 OS8577 NRT 10:00 CHI 09:00 01:45
Here is the content of my billet.l
(ticket.l
) file:
%{
#include "billet.tab.h"
void yyerror(const char *s);
%}
DIGIT [0-9]
ALPHA [A-Za-z]
SEP [ \t]
%%
"DOSSIER" { return DOSSIER; }
{ALPHA}{6} { return CODE_DOSSIER; }
{ALPHA}{3}"/" { yylval.sval = strdup(yytext); return CODE_AEROPORT; }
{ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)? { yylval.sval = strdup(yytext); return NOM_PRENOM; }
{DIGIT}{2}"/"{DIGIT}{2}"/"{DIGIT}{2} { return DATE; }
{ALPHA}{2}{DIGIT}{2,4} { return NUM_VOL; }
{ALPHA}{3} { return CODE_AEROPORT; }
{DIGIT}{2}":"{DIGIT}{2} { yylval.sval = strdup(yytext); return HEURE_OR_DUREE_VOL; }
"+" { return PLUS; }
{SEP}+ { }
\n { return NEWLINE; }
. { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }
%%
And here is the content of my billet.y
(ticket.y
) file:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}
%union {
char *sval;
}
%token DOSSIER CODE_DOSSIER NEWLINE PLUS
%token <sval> DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL
%token <sval> NOM_PRENOM
%type <sval> nom_prenom
%type <sval> heure_arrivee
%type <sval> heure_avec_plus
%%
billet: DOSSIER CODE_DOSSIER NEWLINE infos_passager NEWLINE vols;
infos_passager: nom_prenom '/' nom_prenom NEWLINE { printf("Infos passager : %s / %s\n", $1, $3); };
vols: vol NEWLINE vols | vol NEWLINE;
vol: DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL CODE_AEROPORT heure_arrivee HEURE_OR_DUREE_VOL { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); };
heure_arrivee: heure_avec_plus | HEURE_OR_DUREE_VOL;
heure_avec_plus: PLUS HEURE_OR_DUREE_VOL { $$ = $2; };
nom_prenom: NOM_PRENOM;
%%
int main() {
yyparse();
return 0;
}
void yyerror(const char *s) {
fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}
When I compile everything, I can't test my program on the ExampleAirplaneTicket.txt
file.
I simply have a syntax error, and despite several attempts, I have not been able to solve these problems or even figure out where it comes from.
I am looking for help to understand and solve these problems. If you have any suggestions or advice on how to solve these errors, I would be very grateful.
I tried to implement a parser using flex
and bison
to parse a specific text format representing airline ticket information. I wrote the .l and .y files and made the necessary adjustments based on the previous problems. Now I expect the program to compile successfully and parse the file ExampleAirlineTicket.txt without any syntax errors or other problems. Except that when I test with the file I just get a syntax error, but no idea where it comes from.
When I compile billet.l
I get this warning (I don't think it's a problem):
billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match
No warning when I compile billet.y
, and when I compile everything with gcc
either.
But when I test with the text file I get this :
Syntax error
UPDATE :
I combined the tokens HEURE and DUREE_VOL into one token : TIME_OR_FLIGHT_TIME .
My files above have been updated.
I don't have warnings on line 17 and 20 anymore but I have a warning on line 18 when compiling billet.l.
And still the same syntax error when executing the text file
File billet.l :
I modified line 15 to change the pattern from {ALPHA}+("/"{ALPHA}+)?("-"{ALPHA}+)? to {ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)? in order to assume the name has at least 4 letters.
I also changed the token returned on line 15 from NOM_PRENOM to STRING.
file billet.y :
I added a new token <sval> STRING to the list of tokens.
I modified the nom_prenom rule to accept either STRING or STRING / STRING.
I have new errors to compile from billet.y (and no billet.l anymore) :
ticket.y: warning: 1 conflict per offset/reduction [-Wconflicts-sr]
ticket.y: note: run with "-Wcounterexamples" option to generate counterexamples of conflicts
And still a syntax error when I execute my text file ExampleAirplaneTicket.txt
答案1
得分: 1
以下是代码部分的翻译:
The following warnings from flex:
billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match
come from the fact that the rule of:
NOM_PRENOM
covers what is expected by the rule ofCODE_AEROPORT
HEURE
token is the same pattern as the rule ofDUREE_VOL
So, some of the tokens (CODE_AEROPORT
and DUREE_VOL
) will never appear. This may be the reason why you get the default "Syntax error" message.
Note: The C source file generated by bison shows that "Syntax error" is reported when the number of reported tokens (yycount internal variable) is 0:
/*
[...]
- The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
[...]
*/
[...]
switch (yycount)
{
# define YYCASE_(N, S) \
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
# undef YYCASE_
}
Update
Update from the latest modifications of the post. There are ambiguities in your lexical analyzer. The work to discriminate the inputs should be done in the grammar. Here is a proposition where the number of tokens in the lexical analyzer is reduced and where the rules in the grammar are more detailed.
Here is the simplified lexical analyzer (billet.l):
%{
#include "billet.tab.h"
void yyerror(const char *s);
%}
DIGIT [0-9]
ALPHA [A-Za-z]
ALPHA2 [-A-Za-z]
SEP [ \t]
%%
"DOSSIER" { return DOSSIER; }
{ALPHA2}+ { yylval.sval = strdup(yytext); return STRING; }
{DIGIT}+ { yylval.sval = strdup(yytext); return NUM; }
{ALPHA}{2}{DIGIT}{2,4} { yylval.sval = strdup(yytext); return NUM_VOL; }
"+" { return PLUS; }
{SEP}+ { }
\n { return NEWLINE; }
"/" { return SLASH; }
":" { return COLON; }
. { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }
%%
<details>
<summary>英文:</summary>
The following warnings from flex:
```none
billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match
come from the fact that the rule of:
NOM_PRENOM
covers what is expected by the rule ofCODE_AEROPORT
HEURE
token is the same pattern as the rule ofDUREE_VOL
So, some of the tokens (CODE_AEROPORT
and DUREE_VOL
) will never appear. This may be the reason why you get the default "Syntax error" message.
Note: The C source file generated by bison shows that "Syntax error" is reported when the number of reported tokens (yycount internal variable) is 0:
/*
[...]
- The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
[...]
*/
[...]
switch (yycount)
{
# define YYCASE_(N, S) \
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
# undef YYCASE_
}
Update
Update from the latest modifications of the post. There are ambiguities in your lexical analyzer. The work to discriminate the inputs should be done in the grammar. Here is a proposition where the number of tokens in the lexical analyzer is reduced and where the rules in the grammar are more detailed.
Here is the simplified lexical analyzer (billet.l):
%{
#include "billet.tab.h"
void yyerror(const char *s);
%}
DIGIT [0-9]
ALPHA [A-Za-z]
ALPHA2 [-A-Za-z]
SEP [ \t]
%%
"DOSSIER" { return DOSSIER; }
{ALPHA2}+ { yylval.sval = strdup(yytext); return STRING; }
{DIGIT}+ { yylval.sval = strdup(yytext); return NUM; }
{ALPHA}{2}{DIGIT}{2,4} { yylval.sval = strdup(yytext); return NUM_VOL; }
"+" { return PLUS; }
{SEP}+ { }
\n { return NEWLINE; }
"/" { return SLASH; }
":" { return COLON; }
. { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }
%%
And a little more elaborated syntaxic analyzer (billet.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}
%union {
char *sval;
}
%token DOSSIER CODE_DOSSIER NEWLINE PLUS SLASH COLON
%token <sval> DATE NUM_VOL STRING NUM
%token <sval> NOM_PRENOM
%type <sval> duree_vol
%type <sval> heure
%type <sval> airport
%type <sval> nom_prenom
%type <sval> date
%type <sval> heure_arrivee
%type <sval> heure_avec_plus
%define parse.error verbose
%%
liste : billet liste | billet
billet: DOSSIER STRING NEWLINE infos_passager NEWLINE vols;
infos_passager: nom_prenom { printf("Infos passager : %s\n", $1); free($1); };
vols: vol NEWLINE vols | vol NEWLINE;
vol: date NUM_VOL airport heure airport heure_arrivee duree_vol { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); free($1); free($2); free($3); free($4); free($5); free($6); free($7); };
duree_vol : heure
heure : NUM COLON NUM { char str[20]; snprintf(str, sizeof(str), "%s:%s", $1, $3); $$ = strdup(str); free($1); free($3); }
airport : STRING
date: NUM SLASH NUM SLASH NUM { char str[20]; snprintf(str, sizeof(str), "%s/%s/%s", $1, $3, $5); $$ = strdup(str); free($1); free($3); free($5); }
heure_arrivee: heure_avec_plus | heure;
heure_avec_plus: PLUS heure { $$ = $2; };
nom_prenom: STRING | STRING SLASH STRING { char str[120]; snprintf(str, sizeof(str), "%s/%s", $1, $3); $$ = strdup(str); free($1); free($3); };
%%
int main() {
yyparse();
return 0;
}
void yyerror(const char *s) {
fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}
Built it:
$ flex billet.l
$ bison -d billet.y
$ gcc billet.tab.c lex.yy.c -lfl
And run it with something like:
$ ./a.out < input.txt
Infos passager : ANTOINE/DESAINT-EXUPERY
Vol : 22/01/16 OS412 CDG 10:00 VIE 12:00 2:00
Vol : 22/01/16 OS051 VIE 13:20 NRT 07:25 11:05
Vol : 23/01/16 OS8577 NRT 10:00 CHI 09:00 01:45
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论