读取所有字符,直到出现;但不包含在""之间。

huangapple go评论77阅读模式
英文:

Reading all characters until occurence of ; but noch enclosed by ""

问题

我需要解析(或标记化)以下文本:

ASK "Hey dude, what's about \";\"" + "?";
ASK "How old are you?" INTO inAge;
ASK "This is a
multiline String with \";\";" + " can you parse this?"; ANSWER "Sure, i can!";

在词法分析器中,我尝试了以下模式:

ASK     : ''ASK'' -> pushMode(UNTILSEMI) ;
ANSWER  : ''ANSWER'' -> pushMode(UNTILSEMI) ;

mode UNTILSEMI;
ENDSEMI   : ';'+ -> popMode ;
CONTENT   : ~[;]+ ;

解析器将如下:

askStmt: ASK CONTENT ENDSEMI;
answerStmt: ANSWER CONTENT ENDSEMI;

我的问题是:当在“字符串”内部存在分号时,标记器会停止工作,解析器无法工作。

我不知道该如何开始。我应该直接操作词法分析器吗?我能用词法分析器模式来做到这一点吗?

英文:

Ok... i have the following problem:

i need to parse (or tokenize) the following text

ASK "Hey dude, what's about \";\"" + "?";
ASK "How old are you?" INTO inAge;
ASK "This is a
multiline String with \";\";" + " can you parse this?"; ANSWER "Sure, i can!";

in lexer, i tried it with modes:

ASK     : 'ASK' -> pushMode(UNTILSEMI) ;
ANSWER  : 'ANSWER' -> pushMode(UNTILSEMI) ;

mode UNTILSEMI;
ENDSEMI   : ';'+ -> popMode ;
CONTENT   : ~[;]+ ;

the parser will be:

askStmt: ASK CONTENT ENDSEMI;
answerStmt: ASNWER CONTENT ENDSEMI;

my Problem: when there a semicolons inside of "strings", the tokenizer stops and the parser wont work..

i have no idea how to start. should i manipulate the lexer directly? can i do this with lexer-modes?

答案1

得分: 0

I don't see the need for lexical modes. Something like this would handle your example input correctly:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK expression ( INTO ID )? SEMI
 ;

answer
 : ANSWER expression SEMI
 ;

expression
 : expression PLUS expression
 | STRING
 | ID
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
INTO   : 'INTO';
ID     : [a-zA-Z]+;
PLUS   : '+';
SEMI   : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~["] | '\\' . )* '"';

EDIT

Even without expressions, so only a few tokens, I don't see the need for lexical modes:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK ~SEMI* SEMI OTHER*
 ;

answer
 : ANSWER ~SEMI* SEMI OTHER*
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
SEMI   : ';';
STRING : '"' ( ~["] | '\\' . )* '"';
OTHER  : ~[";];

which will parse your example input as follows:

读取所有字符,直到出现;但不包含在""之间。

英文:

I don't see the need for lexical modes. Something like this would handle your example input correctly:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK expression ( INTO ID )? SEMI
 ;

answer
 : ANSWER expression SEMI
 ;

expression
 : expression PLUS expression
 | STRING
 | ID
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
INTO   : 'INTO';
ID     : [a-zA-Z]+;
PLUS   : '+';
SEMI   : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~[\\"] | '\\' . )* '"';

EDIT

Even without expressions, so only a few tokens, I don't see the need for lexical modes:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK ~SEMI* SEMI OTHER*
 ;

answer
 : ANSWER ~SEMI* SEMI OTHER*
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
SEMI   : ';';
STRING : '"' ( ~[\\"] | '\\' . )* '"';
OTHER  : ~[";];

which will parse your example input as follows:

读取所有字符,直到出现;但不包含在""之间。

huangapple
  • 本文由 发表于 2023年8月10日 17:40:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76874493.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定