英文:
Reading all characters until occurence of ; but noch enclosed by ""
问题
我需要解析(或标记化)以下文本:
ASK "Hey dude, what's about \";\"" + "?";
ASK "How old are you?" INTO inAge;
ASK "This is a
multiline String with \";\";" + " can you parse this?"; ANSWER "Sure, i can!";
在词法分析器中,我尝试了以下模式:
ASK : ''ASK'' -> pushMode(UNTILSEMI) ;
ANSWER : ''ANSWER'' -> pushMode(UNTILSEMI) ;
mode UNTILSEMI;
ENDSEMI : ';'+ -> popMode ;
CONTENT : ~[;]+ ;
解析器将如下:
askStmt: ASK CONTENT ENDSEMI;
answerStmt: ANSWER CONTENT ENDSEMI;
我的问题是:当在“字符串”内部存在分号时,标记器会停止工作,解析器无法工作。
我不知道该如何开始。我应该直接操作词法分析器吗?我能用词法分析器模式来做到这一点吗?
英文:
Ok... i have the following problem:
i need to parse (or tokenize) the following text
ASK "Hey dude, what's about \";\"" + "?";
ASK "How old are you?" INTO inAge;
ASK "This is a
multiline String with \";\";" + " can you parse this?"; ANSWER "Sure, i can!";
in lexer, i tried it with modes:
ASK : 'ASK' -> pushMode(UNTILSEMI) ;
ANSWER : 'ANSWER' -> pushMode(UNTILSEMI) ;
mode UNTILSEMI;
ENDSEMI : ';'+ -> popMode ;
CONTENT : ~[;]+ ;
the parser will be:
askStmt: ASK CONTENT ENDSEMI;
answerStmt: ASNWER CONTENT ENDSEMI;
my Problem: when there a semicolons inside of "strings", the tokenizer stops and the parser wont work..
i have no idea how to start. should i manipulate the lexer directly? can i do this with lexer-modes?
答案1
得分: 0
I don't see the need for lexical modes. Something like this would handle your example input correctly:
parse
: ( question | answer )* EOF
;
question
: ASK expression ( INTO ID )? SEMI
;
answer
: ANSWER expression SEMI
;
expression
: expression PLUS expression
| STRING
| ID
;
ASK : 'ASK';
ANSWER : 'ANSWER';
INTO : 'INTO';
ID : [a-zA-Z]+;
PLUS : '+';
SEMI : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~["] | '\\' . )* '"';
EDIT
Even without expressions, so only a few tokens, I don't see the need for lexical modes:
parse
: ( question | answer )* EOF
;
question
: ASK ~SEMI* SEMI OTHER*
;
answer
: ANSWER ~SEMI* SEMI OTHER*
;
ASK : 'ASK';
ANSWER : 'ANSWER';
SEMI : ';';
STRING : '"' ( ~["] | '\\' . )* '"';
OTHER : ~[";];
which will parse your example input as follows:
英文:
I don't see the need for lexical modes. Something like this would handle your example input correctly:
parse
: ( question | answer )* EOF
;
question
: ASK expression ( INTO ID )? SEMI
;
answer
: ANSWER expression SEMI
;
expression
: expression PLUS expression
| STRING
| ID
;
ASK : 'ASK';
ANSWER : 'ANSWER';
INTO : 'INTO';
ID : [a-zA-Z]+;
PLUS : '+';
SEMI : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~[\\"] | '\\' . )* '"';
EDIT
Even without expressions, so only a few tokens, I don't see the need for lexical modes:
parse
: ( question | answer )* EOF
;
question
: ASK ~SEMI* SEMI OTHER*
;
answer
: ANSWER ~SEMI* SEMI OTHER*
;
ASK : 'ASK';
ANSWER : 'ANSWER';
SEMI : ';';
STRING : '"' ( ~[\\"] | '\\' . )* '"';
OTHER : ~[";];
which will parse your example input as follows:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论