2023年8月10日 17:40:02go评论88阅读模式

英文:

Reading all characters until occurence of ; but noch enclosed by ""

问题

我需要解析（或标记化）以下文本：

ASK &quot;Hey dude, what&#39;s about \&quot;;\&quot;&quot; + &quot;?&quot;;
ASK &quot;How old are you?&quot; INTO inAge;
ASK &quot;This is a
multiline String with \&quot;;\&quot;;&quot; + &quot; can you parse this?&quot;; ANSWER &quot;Sure, i can!&quot;;

在词法分析器中，我尝试了以下模式：

ASK     : '&#39;ASK&#39;' -> pushMode(UNTILSEMI) ;
ANSWER  : '&#39;ANSWER&#39;' -> pushMode(UNTILSEMI) ;

mode UNTILSEMI;
ENDSEMI   : ';'+ -> popMode ;
CONTENT   : ~[;]+ ;

解析器将如下：

askStmt: ASK CONTENT ENDSEMI;
answerStmt: ANSWER CONTENT ENDSEMI;

我的问题是：当在“字符串”内部存在分号时，标记器会停止工作，解析器无法工作。

我不知道该如何开始。我应该直接操作词法分析器吗？我能用词法分析器模式来做到这一点吗？

英文:

Ok... i have the following problem:

i need to parse (or tokenize) the following text

ASK &quot;Hey dude, what&#39;s about \&quot;;\&quot;&quot; + &quot;?&quot;;
ASK &quot;How old are you?&quot; INTO inAge;
ASK &quot;This is a
multiline String with \&quot;;\&quot;;&quot; + &quot; can you parse this?&quot;; ANSWER &quot;Sure, i can!&quot;;

in lexer, i tried it with modes:

ASK     : &#39;ASK&#39; -&gt; pushMode(UNTILSEMI) ;
ANSWER  : &#39;ANSWER&#39; -&gt; pushMode(UNTILSEMI) ;

mode UNTILSEMI;
ENDSEMI   : &#39;;&#39;+ -&gt; popMode ;
CONTENT   : ~[;]+ ;

the parser will be:

askStmt: ASK CONTENT ENDSEMI;
answerStmt: ASNWER CONTENT ENDSEMI;

my Problem: when there a semicolons inside of "strings", the tokenizer stops and the parser wont work..

i have no idea how to start. should i manipulate the lexer directly? can i do this with lexer-modes?

答案1

得分: 0

I don't see the need for lexical modes. Something like this would handle your example input correctly:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK expression ( INTO ID )? SEMI
 ;

answer
 : ANSWER expression SEMI
 ;

expression
 : expression PLUS expression
 | STRING
 | ID
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
INTO   : 'INTO';
ID     : [a-zA-Z]+;
PLUS   : '+';
SEMI   : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~["] | '\\' . )* '"';

EDIT

Even without expressions, so only a few tokens, I don't see the need for lexical modes:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK ~SEMI* SEMI OTHER*
 ;

answer
 : ANSWER ~SEMI* SEMI OTHER*
 ;

ASK    : 'ASK';
ANSWER : 'ANSWER';
SEMI   : ';';
STRING : '"' ( ~["] | '\\' . )* '"';
OTHER  : ~[";];

which will parse your example input as follows:

英文:

I don't see the need for lexical modes. Something like this would handle your example input correctly:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK expression ( INTO ID )? SEMI
 ;

answer
 : ANSWER expression SEMI
 ;

expression
 : expression PLUS expression
 | STRING
 | ID
 ;

ASK    : &#39;ASK&#39;;
ANSWER : &#39;ANSWER&#39;;
INTO   : &#39;INTO&#39;;
ID     : [a-zA-Z]+;
PLUS   : &#39;+&#39;;
SEMI   : &#39;;&#39;;
SPACES : [ \t\r\n]+ -&gt; skip;
STRING : &#39;&quot;&#39; ( ~[\\&quot;] | &#39;\\&#39; . )* &#39;&quot;&#39;;

EDIT

Even without expressions, so only a few tokens, I don't see the need for lexical modes:

parse
 : ( question | answer )* EOF
 ;

question
 : ASK ~SEMI* SEMI OTHER*
 ;

answer
 : ANSWER ~SEMI* SEMI OTHER*
 ;

ASK    : &#39;ASK&#39;;
ANSWER : &#39;ANSWER&#39;;
SEMI   : &#39;;&#39;;
STRING : &#39;&quot;&#39; ( ~[\\&quot;] | &#39;\\&#39; . )* &#39;&quot;&#39;;
OTHER  : ~[&quot;;];

which will parse your example input as follows:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

读取所有字符，直到出现;但不包含在""之间。

问题

答案1

EDIT

EDIT

在Java中使用ANTLR4，检查有效表达式，参数类型。

ANTLR解析器在Java中为什么不对无效的数字输入抛出错误？

antlr4：运算符优先级更改

ANTLR4中的一个片段能否使用另一个片段？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论