在Java中使用ANTLR4,检查有效表达式,参数类型。

huangapple go评论89阅读模式
英文:

In java using ANTLR4, check valid expression,argument type

问题

// Lexer: FunctionValidateLexer.g4

lexer grammar FunctionValidateLexer;

NAME: [A-Za-z0-9."`~!@#+%_\-]+;
PERCENT:'%';
ASTERISK:'*';
OPENSQBRACKET:'[';
CLOSEDSQBRACKET:']';
AMPERSAND:'&';
CAP:'^';
DOT: '.';
L_BRACKET: '(';
R_BRACKET: ')';
HYPHEN:'-';
UNDERSCORE:'_';
DOLLAR:'$';
PLUS:'+';
WS : [ \t\r\n]+ -> skip;

// Define a lexer rule for single and double quoted strings
SINGLE_QUOTED_STRING: '\'' (~['\r\n\\] | '\\\'')* '\'';
DOUBLE_QUOTED_STRING: '"' (~["\r\n\\] | '\\"')* '"';

// Define a lexer rule for handling commas and parentheses within quoted strings
QUOTED_CONTENT: (SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING);

// Define a lexer rule for commas and parentheses that are not within quoted strings
COMMA: ',' -> pushMode(InsideComma);
OPEN_PAREN: '(' -> pushMode(InsideParen);
CLOSE_PAREN: ')' -> popMode();

mode InsideComma;
    NON_COMMA: ~[,\r\n]+ -> popMode;

mode InsideParen;
    NON_PAREN: ~[\(\r\n]+ -> popMode;
// Parser: FunctionValidateParser.g4

parser grammar FunctionValidateParser;
options { tokenVocab=FunctionValidateLexer; }

functions : function* EOF;
function : NAME '(' (argument (COMMA argument)*)? ')';
argument: (NAME | function | QUOTED_CONTENT);

In the lexer rules above, I've added definitions for single and double quoted strings (SINGLE_QUOTED_STRING and DOUBLE_QUOTED_STRING). These rules will capture strings between single quotes or double quotes while ignoring escaped quotes within the strings.

I've also introduced a new lexer rule called QUOTED_CONTENT, which matches the content within single or double quoted strings, allowing commas and parentheses to appear there.

In the lexer rules for COMMA and OPEN_PAREN, I've added modes (InsideComma and InsideParen) to handle situations where commas and parentheses are encountered outside of quoted strings.

The InsideComma and InsideParen modes are used to capture characters that are not commas or parentheses within the respective contexts.

The NON_COMMA and NON_PAREN rules inside these modes capture characters that are not commas or parentheses, allowing for the lexer to return to the regular mode when a comma or parenthesis is closed within a quoted string.

These changes in the lexer allow for handling commas and parentheses within quoted strings differently from those outside of quoted strings, addressing your requirement to consider them only when they appear between single or double quotes.

The parser grammar remains largely unchanged, but now it includes the QUOTED_CONTENT lexer token as a valid argument for functions. This ensures that quoted content, including commas and parentheses, is correctly parsed as arguments when they are within quotes.

英文:

I am new to antlr4, using antl4 and java how we can write parsing nested expression. check the argument whether it is int, string, decimal, or boolean and the expression is a valid expression.

Example:

1. toString("test")
2. mul(toNumber("1.6"),add(3.14,1.5))
3. getRandomNumber()
4. split(split("1/2,3/4,4/5",","),"/")
5. append("[1,2,3","]")

Below is the expression names for checking whether the expression is valid or not.

Map<String,String> map=new HashMap<>();
map.put("toString","String");
map.put("mul","decimal,decimal");
map.put("toNumber","String");
map.put("add","decimal,decimal");
map.put("generateRandomNumber","");

So, by using the above map we have to check whether the name is correct and the return type is correct in case of nested expression, as it will be an argument for another expression. And if expression name is correct we have to check is the arguments are correct or not. I have written the lexer and parser it is working but for some inputs like [,],",' and comma like these inputs it is failing as in expression we are having comma(,) for separation of argument. Below are the lexer and parser.

Lexer:
FunctionValidateLexer.g4

lexer grammar FunctionValidateLexer;
NAME: [A-Za-z0-9."`~!@#+%_-]+;
PERCENT:'%';
ASTERICK:'*';
OPENSQBRKET:'\\[';
CLOSEDSQBRKET:'\\]';
AMPERSAND:'&';
CAP:'^';
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
HIPHEN:'-';
UNDERSCORE:'_';
DOLLAR:'$';
PLUS:'+';
WS : [ \t\r\n]+ -> skip;

parser:
FunctionValidateParser.g4

parser grammar FunctionValidateParser;
options { tokenVocab=FunctionValidateLexer; }
functions : function* EOF;
function : NAME '(' (argument (COMMA argument)*)? ')';
argument: (NAME | function );

I have written visitor pattern for expression name and argument validation. But I facing problem in defining lexer and parser for accepting required arguments.

How can I change the lexer and parser to parse to accept all characters except comma(,) , round brackets( ( ). The comma and round bracket should be considered as an argument whenever they are between two double or single quotes( like ',' or "," or "(" or ")").

So as described above I wanted to accept all characters like ` ! @ # $ % ^ & * [ ] / ? < > : ; " " \ | . + - } { . But as round brackets and comma are part of expression definition, they have to be considered only when they are between single or double quotes otherwise throw error. How can modify my lexer and parser for accepting the above requirement.

答案1

得分: 1

I don't understand why you're not matching strings: "... ". This makes no sense to me. The following grammar parses all of your example input:

parse     : function* EOF;
function  : ID '(' expr_list? ')';
expr_list : expr (',' expr)*;
expr      : function | STRING | NUMBER | ID;

STRING    : '"' ~'"'* '"';
NUMBER    : [0-9]+ ('.' [0-9]+)?;
ID        : [a-zA-Z_] [a-zA-Z_0-9]*;
SPACES    : [ \t\r\n]+ -> skip;

[![enter image description here][1]][1]


<details>
<summary>英文:</summary>

I don&#39;t understand why you&#39;re not matching strings: `&quot; ... &quot;`. This makes no sense to me. The following grammar parses all of your example input:

parse : function* EOF;
function : ID '(' expr_list? ')';
expr_list : expr (',' expr)*;
expr : function | STRING | NUMBER | ID;

STRING : '"' ~'"'* '"';
NUMBER : [0-9]+ ('.' [0-9]+)?;
ID : [a-zA-Z_] [a-zA-Z_0-9]*;
SPACES : [ \t\r\n]+ -> skip;


[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/H74Sa.png

</details>



huangapple
  • 本文由 发表于 2020年4月10日 02:51:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/61128232.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定