2012-02-15 54 views
5

对于iOS应用程序,我想解析可能包含UNIX样式变量的HTML文件以进行替换。例如,HTML可能看起来像:带替换变量的HTML的简单ParseKit语法

<html> 
    <head></head> 
    <body> 
    <h1>${title}</h1> 
    <p>${paragraph1}</p> 
    <img src="${image}" /> 
    </body> 
</html> 

我试图创建一个简单的语法ParseKit会为我提供两个回调:一个用于直通HTML,另一个用于检测的变量。对于这一点,我创建了以下语法:

@start  = Empty | content*; 

content  = variable | passThrough; 
passThrough = /[^$]+/; 
variable  = '$' '{' Word closeChar; 

openChar  = '${'; 
closeChar  = '}'; 

我至少面临两个问题与此:对variable我原本宣布它为openChar Word closeChar,但它没有工作(我仍然不知道为什么)。第二个问题(更重要的是)解析器在找到<img src"${image}" />(即引用字符串中的变量)时停止。

我的问题是:

  1. 我怎样才能修改语法,使其按预期工作?
  2. 使用分词器会更好吗?如果是这样的话,我应该如何配置它?

回答

4

这里是ParseKit的开发者。我会回答你的两个问题:

1)你正在采取正确的方法,但这是一个棘手的情况。有几个小陷阱,你的语法需要改变一点。

我已经开发了一个语法这是为我工作:

// Tokenizer Directives 
@symbolState = '"' "'"; // effectively tells the tokenizer to turn off QuoteState. 
         // Otherwise, variables enclosed in quotes would not be found (they'd be embedded in quoted strings). 
         // now single- & double-quotes will be recognized as individual symbols, not start- & end-markers for quoted strings 

@symbols = '${'; // declare '${' as a multi-char symbol 

@reportsWhitespaceTokens = YES; // tell the tokenizer to preserve/report whitespace 

// Grammar 
@start = content*; 
content = passthru | variable; 
passthru = /[^$].*/; 
variable = start name end; 
start = '${'; 
end = '}'; 
name = Word; 

然后实现在汇编这两个回调:

- (void)parser:(PKParser *)p didMatchName:(PKAssembly *)a { 
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a); 
    PKToken *tok = [a pop]; 

    NSString *name = tok.stringValue; 
    // do something with name 
} 

- (void)parser:(PKParser *)p didMatchPassthru:(PKAssembly *)a { 
    NSLog(@"%s %@", __PRETTY_FUNCTION__, a); 
    PKToken *tok = [a pop]; 

    NSMutableString *s = a.target; 
    if (!s) { 
     s = [NSMutableString string]; 
    } 

    [s appendString:tok.stringValue]; 

    a.target = s; 
} 

然后你的客户机/驱动程序代码会看起来像这:

NSString *g = // fetch grammar 
PKParser *p = [[PKParserFactory factory] parserFromGrammar:g assembler:self]; 
NSString *s = @"<img src=\"${image}\" />"; 
[p parse:s]; 
NSString *result = [p parse:s]; 
NSLog(@"result %@", result); 

这将被打印:

result: <img src="" /> 

2)是的,我想肯定会更好,直接使用标记生成器对于这种相对简单的情况。性能将会大大提高。下面是你如何使用Tokenizer来完成任务:

PKTokenizer *t = [PKTokenizer tokenizerWithString:s]; 
[t setTokenizerState:t.symbolState from:'"' to:'"']; 
[t setTokenizerState:t.symbolState from:'\'' to:'\'']; 
[t.symbolState add:@"${"]; 
t.whitespaceState.reportsWhitespaceTokens = YES; 

NSMutableString *result = [NSMutableString string]; 

PKToken *eof = [PKToken EOFToken]; 
PKToken *tok = nil; 
while (eof != (tok = [t nextToken])) { 
    if ([@"${" isEqualToString:tok.stringValue]) { 
     tok = [t nextToken]; 
     NSString *varName = tok.stringValue; 

     // do something with variable 
    } else if ([@"}" isEqualToString:tok.stringValue]) { 
     // do nothing 
    } else { 
     [result appendString:tok.stringValue]; 
    } 
} 
+1

谢谢Todd!我将采用标记器方法,因为它看起来更快,而且实现起来也不那么复杂。不过,我期待在某个时候使用语法。 – pgb 2012-02-16 12:15:59