2011-09-28 94 views
0

可以说我有一个正则表达式:回溯正则表达式

match = re.search(pattern, content) 
if not match: 
    raise Exception, 'regex traceback' # i want to throw here the regex matching process. 

如果正则表达式fails to match然后我想在exception扔它的工作,并在那里没有正则表达式模式匹配,在哪个阶段等。是否有可能实现所需的功能?

+0

它看起来你有什么工作。你测试过了吗? –

+1

看看[获取python正则表达式解析树来调试您的正则表达式](http://stackoverflow.com/questions/101268/hidden-features-of-python/143636#143636) – agf

回答

0

我有事情,可以帮助我我的代码中调试复杂的正则表达式模式。
这对你有帮助吗? :

import re 

li = ('ksjdhfqsd\n' 
     '5 12478 abdefgcd ocean__12  ty--\t\t ghtr789\n' 
     'qfgqrgqrg', 

     '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n', 

     '2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877', 

     '9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798', 


     'ksjdhfqsd\n' 
     '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n' 
     'qfgqrgqrg', 

     '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n', 

     '25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877', 

     '9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798') 


tupleRE = ('^\d', 
      ' ', 
      '\d{5}', 
      ' ', 
      '[abcdefghi]+', 
      ' ', 
      '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)', 
      '[a-z]+', 
      '__', 
      '[\d]+', 
      ' +', 
      '[^\t]+', 
      '\t\t', 
      ' ', 
      'ght', 
      '(r[5-9]+|u[0-4]+)', 
      '$') 



def REtest(ch, tuplRE, flags = re.MULTILINE): 
    for n in xrange(len(tupleRE)): 
     regx = re.compile(''.join(tupleRE[:n+1]), flags) 
     testmatch = regx.search(ch) 
     if not testmatch: 
      print '\n -*- tupleRE :\n' 
      print '\n'.join(str(i).zfill(2)+' '+repr(u) 
          for i,u in enumerate(tupleRE[:n])) 
      print ' --------------------------------' 
      # tupleRE doesn't works because of element n 
      print str(n).zfill(2)+' '+repr(tupleRE[n])\ 
        +" doesn't match anymore from this ligne "\ 
        +str(n)+' of tupleRE' 
      print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u) 
          for j,u in enumerate(tupleRE[n+1: 
                 min(n+2,len(tupleRE))])) 

      for i in xrange(n): 
       match = re.search(''.join(tupleRE[:n-i]),ch, flags) 
       if match: 
        break 

      matching_portion = match.group() 
      matching_li = '\n'.join(map(repr, 
             matching_portion.splitlines(True)[-5:])) 
      fin_matching_portion = match.end() 
      print ('\n\n -*- Part of the tested string which is concerned :\n\n' 
        '######### matching_portion ########\n'+matching_li + '\n' 
        '##### end of matching_portion #####\n' 
        '-----------------------------------\n' 
        '######## unmatching_portion #######') 
      print '\n'.join(map(repr, 
           ch[fin_matching_portion: 
            fin_matching_portion+300].splitlines(True))) 
      break 
    else: 
     print '\n SUCCES . The regex integrally matches.' 



for x in li: 
    print ' -*- Analyzed string :\n%r' % x 
    REtest(x,tupleRE) 
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm' 

结果

-*- Analyzed string : 
'ksjdhfqsd\n5 12478 abdefgcd ocean__12  ty--\t\t ghtr789\nqfgqrgqrg' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
05 ' ' 
    -------------------------------- 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' doesn't match anymore from this ligne 6 of tupleRE 
07 '[a-z]+' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'5 12478 abdefgcd ' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'ocean__1247101247887 ty--\t\t ghtr789\n' 
'qfgqrgqrg' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
05 ' ' 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' 
07 '[a-z]+' 
08 '__' 
09 '[\\d]+' 
10 ' +' 
11 '[^\t]+' 
12 '\t\t' 
13 ' ' 
14 'ght' 
15 '(r[5-9]+|u[0-4]+)' 
    -------------------------------- 
16 '$' doesn't match anymore from this ligne 16 of tupleRE 



    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'940\n' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

    -*- tupleRE : 

00 '^\\d' 
    -------------------------------- 
01 ' ' doesn't match anymore from this ligne 1 of tupleRE 
02 '\\d{5}' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'2' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'5 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
    -------------------------------- 
05 ' ' doesn't match anymore from this ligne 5 of tupleRE 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'9 54879 bbde' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'Yddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
+0

是我已经使用它,并找到有帮助,但它有点复杂,但:p –

0

如果您需要测试re,您可以使用群组,然后* ... as(sometext)* 与您所需的正则表达式一起使用,然后您应该能够拔出失败位置

,然后利用以下,作为

POS 被传递到搜索()或RegexObject的匹配()方法中的POS的值上python.org说明。这是RE引擎开始寻找匹配的字符串的索引。

endpos 传递给> RegexObject的search()或match()方法的endpos的值。这是RE引擎不会去的字符串的索引。

lastindex 最后一个匹配的捕获组的整数索引,或者如果没有组完全匹配,则返回None。例如,如果将表达式(a)b,((a)(b))和((ab))应用于字符串“ab”,则lastindex == 1,而表达式(a)(b)将如果应用于相同的字符串,则lastindex == 2。

lastgroup 上次匹配的捕获组的名称,或者如果该组没有名称,或者根本没有组匹配,则为None。

re match()或search()方法生成此MatchObject实例的正则表达式对象。

字符串 传递给match()或search()的字符串。

所以一个很简单的例子

>>> m1 = re.compile(r'the real thing') 
>>> m2 = re.compile(r'(the)* (real)* (thing)*') 
>>> if not m1.search(mytextvar): 
>>>  res = m2.search(mytextvar) 
>>>  print res.lastgroup 
>>>  #raise my exception