如何调试ANTLR4语法无关的/不匹配的输入错误

我需要解析的规则手册中包含“demo.rb”文件：如何调试ANTLR4语法无关的/不匹配的输入错误

rulebook Titanic-Normalization { 
    version 1 

    meta { 
    description "Test" 
    source "my-rules.xslx" 
    user "joltie" 
    } 

    rule remove-first-line { 
    description "Removes first line when offset is zero" 
    when(present(offset) && offset == 0) then { 
     filter-row-if-true true; 
    } 
    } 
}

我写的ANTLR4语法文件Rulebook.g4像下面。目前，它可以很好地解析* .rb文件，但遇到“表达式”/“语句”规则时会引发意外错误。

grammar Rulebook; 

rulebookStatement 
    : KWRulebook 
     (GeneralIdentifier | Identifier) 
     '{' 
     KWVersion 
     VersionConstant 
     metaStatement 
     (ruleStatement)+ 
     '}' 
    ; 

metaStatement 
    : KWMeta 
     '{' 
     KWDescription 
     StringLiteral 
     KWSource 
     StringLiteral 
     KWUser 
     StringLiteral 
     '}' 
    ; 

ruleStatement 
    : KWRule 
     (GeneralIdentifier | Identifier) 
     '{' 
     KWDescription 
     StringLiteral 
     whenThenStatement 
     '}' 
    ; 

whenThenStatement 
    : KWWhen '(' expression ')' 
     KWThen '{' statement '}' 
    ; 

primaryExpression 
    : GeneralIdentifier 
    | Identifier 
    | StringLiteral+ 
    | '(' expression ')' 
    ; 

postfixExpression 
    : primaryExpression 
    | postfixExpression '[' expression ']' 
    | postfixExpression '(' argumentExpressionList? ')' 
    | postfixExpression '.' Identifier 
    | postfixExpression '->' Identifier 
    | postfixExpression '++' 
    | postfixExpression '--' 
    ; 

argumentExpressionList 
    : assignmentExpression 
    | argumentExpressionList ',' assignmentExpression 
    ; 

unaryExpression 
    : postfixExpression 
    | '++' unaryExpression 
    | '--' unaryExpression 
    | unaryOperator castExpression 
    ; 

unaryOperator 
    : '&' | '*' | '+' | '-' | '~' | '!' 
    ; 

castExpression 
    : unaryExpression 
    | DigitSequence // for 
    ; 

multiplicativeExpression 
    : castExpression 
    | multiplicativeExpression '*' castExpression 
    | multiplicativeExpression '/' castExpression 
    | multiplicativeExpression '%' castExpression 
    ; 

additiveExpression 
    : multiplicativeExpression 
    | additiveExpression '+' multiplicativeExpression 
    | additiveExpression '-' multiplicativeExpression 
    ; 

shiftExpression 
    : additiveExpression 
    | shiftExpression '<<' additiveExpression 
    | shiftExpression '>>' additiveExpression 
    ; 

relationalExpression 
    : shiftExpression 
    | relationalExpression '<' shiftExpression 
    | relationalExpression '>' shiftExpression 
    | relationalExpression '<=' shiftExpression 
    | relationalExpression '>=' shiftExpression 
    ; 

equalityExpression 
    : relationalExpression 
    | equalityExpression '==' relationalExpression 
    | equalityExpression '!=' relationalExpression 
    ; 

andExpression 
    : equalityExpression 
    | andExpression '&' equalityExpression 
    ; 

exclusiveOrExpression 
    : andExpression 
    | exclusiveOrExpression '^' andExpression 
    ; 

inclusiveOrExpression 
    : exclusiveOrExpression 
    | inclusiveOrExpression '|' exclusiveOrExpression 
    ; 

logicalAndExpression 
    : inclusiveOrExpression 
    | logicalAndExpression '&&' inclusiveOrExpression 
    ; 

logicalOrExpression 
    : logicalAndExpression 
    | logicalOrExpression '||' logicalAndExpression 
    ; 

conditionalExpression 
    : logicalOrExpression ('?' expression ':' conditionalExpression)? 
    ; 

assignmentExpression 
    : conditionalExpression 
    | unaryExpression assignmentOperator assignmentExpression 
    | DigitSequence // for 
    ; 

assignmentOperator 
    : '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|=' 
    ; 

expression 
    : assignmentExpression 
    | expression ',' assignmentExpression 
    ; 

statement 
    : expressionStatement 
    ; 

expressionStatement 
    : expression+ ';' 
    ; 


KWRulebook: 'rulebook'; 
KWVersion: 'version'; 
KWMeta: 'meta'; 
KWDescription: 'description'; 
KWSource: 'source'; 
KWUser: 'user'; 
KWRule: 'rule'; 
KWWhen: 'when'; 
KWThen: 'then'; 
KWTrue: 'true'; 
KWFalse: 'false'; 

fragment 
LeftParen : '('; 

fragment 
RightParen : ')'; 

fragment 
LeftBracket : '['; 

fragment 
RightBracket : ']'; 

fragment 
LeftBrace : '{'; 

fragment 
RightBrace : '}'; 


Identifier 
    : IdentifierNondigit 
     ( IdentifierNondigit 
     | Digit 
     )* 
    ; 

GeneralIdentifier 
    : Identifier 
     ('-' Identifier)+ 
    ; 

fragment 
IdentifierNondigit 
    : Nondigit 
    //| // other implementation-defined characters... 
    ; 

VersionConstant 
    : DigitSequence ('.' DigitSequence)* 
    ; 

DigitSequence 
    : Digit+ 
    ; 

fragment 
Nondigit 
    : [a-zA-Z_] 
    ; 

fragment 
Digit 
    : [0-9] 
    ; 

StringLiteral 
    : '"' SCharSequence? '"' 
    | '\'' SCharSequence? '\'' 
    ; 

fragment 
SCharSequence 
    : SChar+ 
    ; 

fragment 
SChar 
    : ~["\\\r\n] 
    | '\\\n' // Added line 
    | '\\\r\n' // Added line 
    ; 

Whitespace 
    : [ \t]+ 
     -> skip 
    ; 

Newline 
    : ( '\r' '\n'? 
     | '\n' 
     ) 
     -> skip 
    ; 

BlockComment 
    : '/*' .*? '*/' 
     -> skip 
    ; 

LineComment 
    : '//' ~[\r\n]* 
     -> skip 
    ;

我测试的规则手册解析器与单元测试象下面这样：

public void testScanRulebookFile() throws IOException { 
     String fileName = "C:\\rulebooks\\demo.rb"; 
     FileInputStream fis = new FileInputStream(fileName); 
     // create a CharStream that reads from standard input 
     CharStream input = CharStreams.fromStream(fis); 

     // create a lexer that feeds off of input CharStream 
     RulebookLexer lexer = new RulebookLexer(input); 

     // create a buffer of tokens pulled from the lexer 
     CommonTokenStream tokens = new CommonTokenStream(lexer); 

     // create a parser that feeds off the tokens buffer 
     RulebookParser parser = new RulebookParser(tokens); 


     RulebookStatementContext context = parser.rulebookStatement(); 
//  WhenThenStatementContext context = parser.whenThenStatement(); 

     System.out.println(context.toStringTree(parser)); 

//  ParseTree tree = parser.getContext(); // begin parsing at init rule 
//  System.out.println(tree.toStringTree(parser)); // print LISP-style tree 
    }

对于“demo.rb”如上述，解析器得到了错误如下。我还以toStringTree的形式打印RulebookStatementContext。

line 12:25 mismatched input '&&' expecting ')' 
(rulebookStatement rulebook Titanic-Normalization { version 1 (metaStatement meta { description "Test" source "my-rules.xslx" user "joltie" }) (ruleStatement rule remove-first-line { description "Removes first line when offset is zero" (whenThenStatement when ((expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression present)) ((argumentExpressionList (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression offset)))))))))))))))))))))))))))))))))) && offset == 0) then { filter-row-if-true true ;) }) })

我也写单元测试，以测试等"when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n"短输入上下文调试问题。但它仍然有像错误：

line 1:16 mismatched input '0' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', Identifier, GeneralIdentifier, DigitSequence, StringLiteral} 
line 2:19 extraneous input 'true' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', ';', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}

有两个一天的尝试，我没有得到任何进展。问题是只要以上，请有人给我一些建议如何调试ANTLR4语法无关/不匹配的输入错误。

来源

2017-08-10 Zhenglinj

我不知道是否有任何更复杂的方法来调试语法/分析器，但这里的如何我usally做到这一点：

减少引起该问题，以尽可能少的字符输入可能。
尽可能减少你的语法，这样它仍然会在相应的输入上产生相同的错误（大部分时间意味着通过循环原始语法的规则来减少输入的最小语法（简化为尽量）
确保词法分析器段输入正确（对于在ANTLRWorks，显示你的词法分析器输出的特征是伟大的）
看一看分析树。ANTLR的testRig有个特点以图形方式显示ParseTree（您可以通过ANTLRWorks或ANTLR的访问此功能），所以你可以看看解析器的解释与你所使用的不同。
做“手动”解析。这意味着你将学习语法并逐步完成输入，一步一步地尝试应用逻辑或假设/知识等。在那个过程中。只要按照自己的语法做一台电脑就可以做到。问题每次采取步骤（是否有另一种方式来匹配输入），并总是试图输入以另一种方式比一个你真正想要它被解析

尝试修复错误的最小匹配语法，然后将解决方案迁移到您的真实语法。

来源

2017-08-10 16:13:45 Raven

感谢您的详细帮助。我已经尝试了1. 2.5分，但我认为我应该尝试更多地分解输入和语法。如果你有空，你能帮我解释一下这个具体的例子吗？ – Zhenglinj

你有没有看过解析树？ – Raven

是的。我在第5点中指出了解析树。我也在stackoverflow中读取了许多相同的问题，但结果很奇怪。我试图用简单的语法和简单的输入来编写“whenThenStatement”规则单元测试。 – Zhenglinj

如何调试ANTLR4语法无关的/不匹配的输入错误

回答

相关问题