2013-02-20 112 views
0

我想验证一个Excel公式的风格,具有以下的正则表达式:验证和分组Excel公式格式

=SUM\(((?:\w+\d+)(?::\w+\d+)?)((?:,\w+\d+)(?::\w+\d+)?)*\) 

在这个来源:

应该通过

=SUM(A1,A11:A212,A12:A56,A342:A12,A3) 
=SUM(A11:A12,A12:a12,A34:A3) 
=SUM(A1,A2,A3) 
=SUM(A1) 

应失败

=SUM(A11:A212:A2,A12:A56,A4,A342:A12) 

我有验证部分工作,但我无法弄清楚如何将每个逗号分组值。他们应该是:

我多么希望他们进行分组:

=SUM(A1,A11:A12,A12:A56,A3)  // Groups: $1 = A1 $2 = A11:A12 $3 = A12:A56 $4 = A3 
=SUM(A11:A12,A10:A12,A34:A3) // Groups: $1 = A11:A12 $2 = A10:A12 $3 = A34:A3 
=SUM(A1,A2,A3)     //Groups: $1 = A1 $2 = A2 $3 = A3 
=SUM(A1)      //Groups: $1 = A1 

如何,他们目前正在分组:

=SUM(A1,A11:A12,A12:A56,A3)  // Groups: $1 = A1 $2 = A3 
=SUM(A11:A12,A10:A12,A34:A3) // Groups: $1 = A11:A12 $2 = A34:A3 
=SUM(A1,A2,A3)     //Groups: $1 = A1 $2 = A3 
=SUM(A1)      //Groups: $1 = A1 

通知,其分组的第一个和最后。我对REGEX很新,所以如果我在这里做了一件很糟糕的事情,请指出我的方向。谢谢!

回答

1

这是不可能的:(...)(?:,(...))+(2组)总是会产生2场比赛,不管多少+匹配。

你需要做的是在(至少)两个步骤:

value  := /\w+\d+(?::\w+\d+)?/ 

value_list := /value(?:,value)*/ 

expression := /=SUM\((value_list)\)/ 

现在从expression(该value_list)符合第1组,并找到所有value出现在这场比赛中。

快速预览PHP:

$text = 'should pass 

=SUM(A1,A11:A212,A12:A56,A342:A12,A3) 
=SUM(A11:A12,A12:a12,A34:A3) 
=SUM(A1,A2,A3) 
=SUM(A1) 

should fail 

=SUM(A11:A212:A2,A12:A56,A4,A342:A12)'; 

$value  = "\w+\d+(?::\w+\d+)?"; 
$value_list = "$value(?:,$value)*"; 
$expression = "=SUM\(($value_list)\)"; 

preg_match_all("/$expression/", $text, $matches); 

// iterate over $value_list from $expression (group 1) 
foreach($matches[1] as $group1) { 
    preg_match_all("/$value/", $group1, $m); 
    print_r($m); 
} 

打印:

Array 
(
    [0] => Array 
     (
      [0] => A1 
      [1] => A11:A212 
      [2] => A12:A56 
      [3] => A342:A12 
      [4] => A3 
     ) 

) 
Array 
(
    [0] => Array 
     (
      [0] => A11:A12 
      [1] => A12:a12 
      [2] => A34:A3 
     ) 

) 
Array 
(
    [0] => Array 
     (
      [0] => A1 
      [1] => A2 
      [2] => A3 
     ) 

) 
Array 
(
    [0] => Array 
     (
      [0] => A1 
     ) 

)
0

我实际上会先分割字符串。喜欢的东西:

sub IsFormulaValid 
{ 
    my $str = $_[0]; 
    (my $match) = $str =~ /^=SUM\(([^)]+)\)$/; 
    my @sumArgs = split(/,\s*/, $match); 
    my $valid = 1; 
    foreach(@sumArgs){ 
     if($_ !~ /^[a-z]+\d+(?::[a-z]+\d+){0,1}$/i){ 
      $valid = 0; 
      last; 
     } 
    } 
    return $valid; 
} 

注意,你也可以查看比赛本身的有效性,并设置$valid当@sumArgs> 0。测试中使用的perl输入:

my @testInput; 

push(@testInput,'=SUM(A1,A11:A212,A12:A56,A342:A12,A3)'); 
push(@testInput,'=SUM(A11:A12,A12:a12,A34:A3)'); 
push(@testInput,'=SUM(A1,A2,A3)'); 
push(@testInput,'=SUM(A1)'); 
push(@testInput,'=SUM(A11:A212:A2,A12:A56,A4,A342:A12)'); 

foreach(@testInput){ 
    print "'$_'\n "; 
    print 'NOT ' if !IsFormulaValid($_); 
    print "VALID\n\n"; 
} 

结果:

'=SUM(A1,A11:A212,A12:A56,A342:A12,A3)' 
    VALID 

'=SUM(A11:A12,A12:a12,A34:A3)' 
    VALID 

'=SUM(A1,A2,A3)' 
    VALID 

'=SUM(A1)' 
    VALID 

'=SUM(A11:A212:A2,A12:A56,A4,A342:A12)' 
    NOT VALID