我坐了一会儿,写了一个完整的FSM解析器,只是为了感兴趣。 (至少在PHP下,我可以用Perl中的递归正则表达式来实现,但不是PHP,它没有这个功能)。但是,它有一些你不可能用正则表达式看到的特性。
- 智能和基于堆栈的托架解析
- AnyBracket支持
- 模块化
- 扩展。
- 当语法错误时,它可以告诉你在哪里。
当然,这里有一小部分代码,它对于新的编码器来说有点复杂和复杂,但是就它是什么而言,它是非常棒的东西。
它不是一个成品,只是有些我扔在一起,但它的工作原理,并没有任何我能找到的错误。
我已经在很多地方死了,通常情况下最好使用Exceptions和Nothingnot,所以清理和重构在推出之前更可取。
它有一个合理的评论量,但我觉得如果我进一步评论有限状态机加工的基本原理会更难理解。
# Pretty Colour Debug of the tokeniser in action.
# Uncomment to use.
function debug($title, $stream, $msg, $remaining){
# print chr(27) ."[31m$title" . chr(27) ."[0m\n";
# print chr(27) ."[33min:$stream" . chr(27) ."[0m\n";
# print chr(27) ."[32m$msg" . chr(27) ."[0m\n";
# print chr(27) ."[34mstream:$remaining" . chr(27) ."[0m\n\n";
}
# Simple utility to store a captured part of the stream in one place
# and the remainder somewhere else
# Wraps most the regexy stuff
# Insprired by some Perl Regex Parser I found.
function get_token($regex, $input){
$out = array(
'success' => false,
'match' => '',
'rest' => ''
);
if(!preg_match('/^' . $regex . '/' , $input, $matches)){
die("Could not match $regex at start of $input ");
#return $out; # error condition, not matched.
}
$out['match'] = $matches[1];
$out['rest'] = substr($input, strlen($out['match']));
$out['success'] = true;
debug('Scan For Token: '. $regex , $input, "matched: " . $out['match'] , $out['rest']);
return $out;
}
function skip_space($input){
return get_token('(\s*)', $input);
}
# Given $input and $opener, find
# the data stream that occurs until the respecive closer.
# All nested bracket sets must be well balanced.
# No 'escape code' implementation has been done (yet)
# Match will contain the contents,
# Rest will contain unprocessed part of the string
# []{}() and bracket types are currently supported.
function close_bracket($input , $opener){
$out = array(
'success' => false,
'match' => '',
'rest' => ''
);
$map = array('(' => ')', '[' => ']', '{' => '}', chr(60) => '>');
$nests = array($map[$opener]);
while(strlen($input) > 0){
$d = get_token('([^()\[\]{}' . chr(60). '>]*?[()\[\]{}' . chr(60) . '>])', $input);
$input = $d['rest'];
if(!$d['success']){
debug('Scan For) Bailing ' , $input, "depth: $nests, matched: " . $out['match'] , $out['rest']);
$out['match'] .= $d['match'];
return $out; # error condition, not matched. brackets are imbalanced.
}
# Work out which of the 4 bracket types we got, and
# Which orientation it is, and then decide if were going up the tree or down it
end($nests);
$tail = substr($d['match'], -1, 1);
if($tail == current($nests)){
array_pop($nests);
} elseif (array_key_exists($tail, $map)){
array_push($nests, $map[$tail]);
} else {
die ("Error. Bad bracket Matching, unclosed/unbalanced/unmatching bracket sequence: " . $out['match'] . $d['match']);
}
$out['match'] .= $d['match'] ;
$out['rest' ] = $d['rest'];
debug('Scan For) running' , $input, "depth: $nests, matched: " . $out['match'] , $out['rest']);
if (count($nests) == 0){
# Chomp off the tail bracket to just get the body
$out['match'] = substr($out['match'] , 0 , -1);
$out['success'] = true;
debug('Scan For) returning ' , $input, "matched: " . $out['match'] , $out['rest']);
return $out;
}
else {
}
}
die('Scan for closing) exhausted buffer while searching. Brackets Missmatched. Fix this: \'' . $out['match'] . '\'');
}
# Given $function_name and $input, expects the form fnname(data)
# 'data' can be any well balanced bracket sequence
# also, brackets used for functions in the stream can be any of your choice,
# as long as you're consistent. fnname[foo] will work.
function parse_function_body($input, $function_name){
$out = array (
'success' => false,
'match' => '',
'rest' => '',
);
debug('Parsing ' . $function_name . "()", $input, "" , "");
$d = get_token("(" . $function_name . '[({\[' . chr(60) . '])' , $input);
if (!$d['success']){
die("Doom while parsing for function $function_name. Not Where its expected.");
}
$e = close_bracket($d['rest'] , substr($d['match'],-1,1));
if (!$e['success']){
die("Found Imbalanced Brackets while parsing for $function_name, last snapshot was '" . $e['match'] . "'");
return $out; # inbalanced brackets for function
}
$out['success'] = true;
$out['match'] = $e['match'];
$out['rest'] = $e['rest'];
debug('Finished Parsing ' . $function_name . "()", $input, 'body:'. $out['match'] , $out['rest']);
return $out;
}
function parse_query($input){
$eat = skip_space($input);
$get = parse_function_body($eat['rest'] , 'get');
if (!$get['success']){
die("Get Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
$eat = skip_space($get['rest']);
$where = parse_function_body($eat['rest'], 'where');
if (!$where['success']){
die("Where Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
$eat = skip_space($where['rest']);
$sort = parse_function_body($eat['rest'], 'sort');
if(!$sort['success']){
die("Sort Token Malformed/Missing, instead found '" . $eat['rest'] . "'");
}
return array(
'get' => $get['match'],
'where' => $where['match'],
'sort' => $sort['match'],
'_Trailing_Data' => $sort['rest'],
);
}
$structure = parse_query("get[max(fieldname1),min(fieldname2),fieldname3]where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get(max(fieldname1),min(fieldname2),fieldname3)where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get{max(fieldname1),min(fieldname2),fieldname3}where(something=something) sort(fieldname2 asc)");
print_r($structure);
$structure = parse_query("get" . chr(60) . "max(fieldname1),min(fieldname2),fieldname3" . chr(60). "where(something=something) sort(fieldname2 asc)");
print_r($structure);
上述所有的print_r($结构)的线应该产生这样的:
Array
(
[get] => max(fieldname1),min(fieldname2),fieldname3
[where] => something=something
[sort] => fieldname2 asc
[_Trailing_Data] =>
)
请发表您的当前最大的努力。 – Rob 2009-02-19 23:58:08
外在的词总是得到,在哪里,并且排序,或者它们可以是任何东西?请澄清 – 2009-02-20 00:09:48
可以是任何东西。基本上它只需要抓住两个外部支架之间的任何东西 - 编者 – atomicharri 2009-02-20 00:13:46