正则表达式重定义错误

我正在使用python，并且遇到一些重定义错误，我知道它们是重定义的，但从逻辑上讲，它不可能达到那个值。有没有办法解决这个问题？我感谢所有帮助提前正则表达式重定义错误

/python-2.5/lib/python2.5/re.py”，线路233，在_compile 引发错误，V＃无效表达 sre_constants.error：组名的重新定义“ ID”作为组9;被组6


import re 

DOB_RE = "(^|;)DOB +(?P<dob>\d{2}-\d{2}-\d{4})" 
ID_RE = "(^|;)ID +(?P<id>[A-Z0-9]{12})" 
INFO_RE = "- (?P<info>.*)" 

PERSON_RE = "((" + DOB_RE + ".*" + ID_RE + ")|(" + \ 
        ID_RE + ".*" + DOB_RE + ")|(" + \ 
        DOB_RE + "|" + ID_RE + ")).*(" + INFO_RE + ")*" 

PARSER = re.compile(PERSON_RE) 

samplestr1 = garbage;DOB 10-10-2010;more garbage\nID PARI12345678;more garbage 
samplestr2 = garbage;ID PARI12345678;more garbage\nDOB 10-10-2010;more garbage 
samplestr3 = garbage;DOB 10-10-2010 
samplestr4 = garbage;ID PARI12345678;more garbage- I am cool

来源

2010-03-04 user285864

正则表达式语法根本不允许相同名字组的多次出现 - 基团不是‘到达’被定义为‘空’（无）

所以你必须改变这些名字，例如dob0,dob1, dob2和id0,id1,id2（然后你可以很容易地“折叠”这些键组来制作你想要的字典，你有一个匹配的组字典后）。

例如，使DOB_RE的功能，而不是一个恒定的，说：

def DOB_RE(i): return "(^|;)DOB +(?P<dob%s>\d{2}-\d{2}-\d{4})" % i

同样地，对于其他人，并改变这三个事件的DOB_RE在你计算PERSON_RE到DOB_RE(0)，DOB_RE(1)等语句（和其他类似）。

来源

2010-03-04 00:24:03

也许在这种情况下，最好是遍历正则表达式列表。

>>> strs=[ 
... "garbage;DOB 10-10-2010;more garbage\nID PARI12345678;more garbage", 
... "garbage;ID PARI12345678;more garbage\nDOB 10-10-2010;more garbage", 
... "garbage;DOB 10-10-2010", 
... "garbage;ID PARI12345678;more garbage- I am cool"] 
>>> import re 
>>> 
>>> DOB_RE = "(^|;|\n)DOB +(?P<dob>\d{2}-\d{2}-\d{4})" 
>>> ID_RE = "(^|;|\n)ID +(?P<id>[A-Z0-9]{12})" 
>>> INFO_RE = "(- (?P<info>.*))?" 
>>> 
>>> REGEX = map(re.compile,[DOB_RE + ".*" + ID_RE + "[^-]*" + INFO_RE, 
...       ID_RE + ".*" + DOB_RE + "[^-]*" + INFO_RE, 
...       DOB_RE + "[^-]*" + INFO_RE, 
...       ID_RE + "[^-]*" + INFO_RE]) 
>>> 
>>> def get_person(s): 
...  for regex in REGEX: 
...   res = re.search(regex,s) 
...   if res: 
...    return res.groupdict() 
... 
>>> for s in strs: 
...  print get_person(s) 
... 
{'dob': '10-10-2010', 'info': None, 'id': 'PARI12345678'} 
{'dob': '10-10-2010', 'info': None, 'id': 'PARI12345678'} 
{'dob': '10-10-2010', 'info': None} 
{'info': 'I am cool', 'id': 'PARI12345678'}

来源

2010-03-04 01:24:47

我最初打算使用的每个类（其中挑选出的表达，可以是任何顺序）张贴pyparsing例子，但后来我看到有混合垃圾，通过使用searchString您的字符串，以便搜索似乎更合适。这让我很感兴趣，因为searchString返回一个ParseResults序列，每个匹配一个（包括任何相应的命名结果）。所以我想，“如果我将使用总和返回的ParseResults结合起来怎么办 - 什么是破解！”，呃，“多么新奇！”所以这里有一个以前从未见过，pyparsing黑客：

from pyparsing import * 
# define the separate expressions to be matched, with results names 
dob_ref = "DOB" + Regex(r"\d{2}-\d{2}-\d{4}")("dob") 
id_ref = "ID" + Word(alphanums,exact=12)("id") 
info_ref = "-" + restOfLine("info") 

# create an overall expression 
person_data = dob_ref | id_ref | info_ref 

for test in (samplestr1,samplestr2,samplestr3,samplestr4,): 
    # retrieve a list of separate matches 
    separate_results = person_data.searchString(test) 

    # combine the results using sum 
    # (NO ONE HAS EVER DONE THIS BEFORE!) 
    person = sum(separate_results, ParseResults([])) 

    # now we have a uber-ParseResults object! 
    print person.id 
    print person.dump() 
    print

给予这样的输出：

PARI12345678 
['DOB', '10-10-2010', 'ID', 'PARI12345678'] 
- dob: 10-10-2010 
- id: PARI12345678 

PARI12345678 
['ID', 'PARI12345678', 'DOB', '10-10-2010'] 
- dob: 10-10-2010 
- id: PARI12345678 


['DOB', '10-10-2010'] 
- dob: 10-10-2010 

PARI12345678 
['ID', 'PARI12345678', '-', ' I am cool'] 
- id: PARI12345678 
- info: I am cool

但我也讲正则表达式。以下是使用re的类似方法。

import re 

# define each individual re, with group names 
dobRE = r"DOB +(?P<dob>\d{2}-\d{2}-\d{4})" 
idRE = r"ID +(?P<id>[A-Z0-9]{12})" 
infoRE = r"- (?P<info>.*)" 

# one re to rule them all 
person_dataRE = re.compile('|'.join([dobRE, idRE, infoRE])) 

# using findall with person_dataRE will return a 3-tuple, so let's create 
# a tuple-merger 
merge = lambda a,b : tuple(aa or bb for aa,bb in zip(a,b)) 

# let's create a Person class to collect the different data bits 
# (or if you are running Py2.6, use a namedtuple 
class Person: 
    def __init__(self,*args): 
     self.dob, self.id, self.info = args 
    def __str__(self): 
     return "- id: %s\n- dob: %s\n- info: %s" % (self.id, self.dob, self.info) 

for test in (samplestr1,samplestr2,samplestr3,samplestr4,): 
    # could have used reduce here, but let's err on the side of explicity 
    persontuple = ('','','') 
    for data in person_dataRE.findall(test): 
     persontuple = merge(persontuple,data) 

    # make a person 
    person = Person(*persontuple) 

    # print out the collected results 
    print person.id 
    print person 
    print

有了这个输出：

PARI12345678 
- id: PARI12345678 
- dob: 10-10-2010 
- info: 

PARI12345678 
- id: PARI12345678 
- dob: 10-10-2010 
- info: 


- id: 
- dob: 10-10-2010 
- info: 

PARI12345678 
- id: PARI12345678 
- dob: 
- info: I am cool

来源

2010-03-04 01:52:55 PaulMcG

@保罗：是还pyparsing对于Python 3？ – 2010-03-04 11:08:38

@Tim：是的，当前版本包含一个pyparsing_py3模块，如果您运行的是Python 3，将会安装这个模块（这是一个良性的安装错误，我将在下一个版本中修复这个错误）。 – PaulMcG 2010-03-04 13:04:29

正则表达式重定义错误

回答

相关问题