Antlr4/Java : how to make a semantic predicate that skips a token (lexer) according to the parser rule that calls it

huangapple go评论118阅读模式

Antlr4/Java : how to make a semantic predicate that skips a token (lexer) according to the parser rule that calls it


  1. 我想要使用我的词法规则
  2. ```antlr4
  3. NEW_LINE : '\n' -> skip;


  1. cook("banana",
  2. "potatoe)


  1. cook("banana", "potatoe") varA = 12.4



  1. start
  2. : line*
  3. ;
  4. line
  5. : line_expression (NEW_LINE | EOF)
  6. ;
  7. line_expression
  8. : expression
  9. | assignment
  10. ;
  11. expression
  12. : Decimal
  13. | Integer
  14. | Text
  15. | Boolean
  16. ;





I would like to use my lexer rule

  1. NEW_LINE : '\n' -> skip;

Like a normal rule. Understanding by this: I want to ignore the new lines except when they are mandatory, to create a Python similar syntax. For example, here, new lines are ignored:

  1. cook("banana",
  2. "potatoe)

but it is impossible to skip the new line for a new statement, like this:

  1. cook("banana", "potatoe") varA = 12.4

, there must be a new line between cook() and the assignment. This is why I sometimes have to skip the new lines, but still force them somewhere else.

This is why I got this idea:

  1. start
  2. : line*
  3. ;
  4. line
  5. : line_expression (NEW_LINE | EOF)
  6. ;
  7. line_expression
  8. : expression
  9. | assignment
  10. ;
  11. expression
  12. : Decimal
  13. | Integer
  14. | Text
  15. | Boolean
  16. ;

And make a semantic predicate like "if the calling parser rule is not line, skip(); it."
Now I just need help to do that.

I hope I was clear !

PS: I'm using Java as main language if that wasn't clear


得分: 1



  1. 语法规则 T;
  2. @lexer::members {
  3. int parensLevel = 0;
  4. }
  5. 解析
  6. : .*? EOF
  7. ;
  8. OPAR : '(' {parensLevel++;};
  9. CPAR : ')' {parensLevel--;};
  10. NUMBER : [0-9]+ ('.' [0-9]+)?;
  11. STRING : '"' ~'"'* '"';
  12. ASSIGN : '=';
  13. COMMA : ',';
  14. ID : [a-zA-Z]+;
  15. SPACES : [ \t]+ -> skip;
  16. NL : {parensLevel == 0}? [\r\n]+;
  17. NL_SKIP : [\r\n]+ -> skip;


  1. cook("banana",
  2. "potatoe")
  3. varA = 12.4


  1. ID `cook`
  2. '(' `(`
  3. STRING `"banana"`
  4. ',' `,`
  5. STRING `"potatoe"`
  6. ')' `)`
  7. NL `\n`
  8. ID `varA`
  9. '=' `=`
  10. NUMBER `12.4`



You could keep track of the number of ( you encounter (and decrease this numbers if you encounter a )). Then you only create NL tokens if this number is equal to zero.

Here's a quick demo:

  1. grammar T;
  2. @lexer::members {
  3. int parensLevel = 0;
  4. }
  5. parse
  6. : .*? EOF
  7. ;
  8. OPAR : '(' {parensLevel++;};
  9. CPAR : ')' {parensLevel--;};
  10. NUMBER : [0-9]+ ( '.' [0-9]+)?;
  11. STRING : '"' ~'"'* '"';
  12. ASSIGN : '=';
  13. COMMA : ',';
  14. ID : [a-zA-Z]+;
  15. SPACES : [ \t]+ -> skip;
  16. NL : {parensLevel == 0}? [\r\n]+;
  17. NL_SKIP : [\r\n]+ -> skip;

If you feed the lexer the following input:

  1. cook("banana",
  2. "potatoe")
  3. varA = 12.4

the following tokens will be created:

  1. ID `cook`
  2. '(' `(`
  3. STRING `"banana"`
  4. ',' `,`
  5. STRING `"potatoe"`
  6. ')' `)`
  7. NL `\n`
  8. ID `varA`
  9. '=' `=`
  10. NUMBER `12.4`

As you can see, the NL inside the parens is skipped, while the one after the ) is not.

  • 本文由 发表于 2020年8月14日 01:48:04
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
