我最初打算使用的每个类(其中挑选出的表达,可以是任何顺序)张贴pyparsing例子,但后来我看到有混合垃圾,通过使用searchString
您的字符串,以便搜索似乎更合适。这让我很感兴趣,因为searchString
返回一个ParseResults序列,每个匹配一个(包括任何相应的命名结果)。所以我想,“如果我将使用总和返回的ParseResults结合起来怎么办 - 什么是破解!”,呃,“多么新奇!”所以这里有一个以前从未见过,pyparsing黑客:
from pyparsing import *
# define the separate expressions to be matched, with results names
dob_ref = "DOB" + Regex(r"\d{2}-\d{2}-\d{4}")("dob")
id_ref = "ID" + Word(alphanums,exact=12)("id")
info_ref = "-" + restOfLine("info")
# create an overall expression
person_data = dob_ref | id_ref | info_ref
for test in (samplestr1,samplestr2,samplestr3,samplestr4,):
# retrieve a list of separate matches
separate_results = person_data.searchString(test)
# combine the results using sum
# (NO ONE HAS EVER DONE THIS BEFORE!)
person = sum(separate_results, ParseResults([]))
# now we have a uber-ParseResults object!
print person.id
print person.dump()
print
给予这样的输出:
PARI12345678
['DOB', '10-10-2010', 'ID', 'PARI12345678']
- dob: 10-10-2010
- id: PARI12345678
PARI12345678
['ID', 'PARI12345678', 'DOB', '10-10-2010']
- dob: 10-10-2010
- id: PARI12345678
['DOB', '10-10-2010']
- dob: 10-10-2010
PARI12345678
['ID', 'PARI12345678', '-', ' I am cool']
- id: PARI12345678
- info: I am cool
但我也讲正则表达式。以下是使用re的类似方法。
import re
# define each individual re, with group names
dobRE = r"DOB +(?P<dob>\d{2}-\d{2}-\d{4})"
idRE = r"ID +(?P<id>[A-Z0-9]{12})"
infoRE = r"- (?P<info>.*)"
# one re to rule them all
person_dataRE = re.compile('|'.join([dobRE, idRE, infoRE]))
# using findall with person_dataRE will return a 3-tuple, so let's create
# a tuple-merger
merge = lambda a,b : tuple(aa or bb for aa,bb in zip(a,b))
# let's create a Person class to collect the different data bits
# (or if you are running Py2.6, use a namedtuple
class Person:
def __init__(self,*args):
self.dob, self.id, self.info = args
def __str__(self):
return "- id: %s\n- dob: %s\n- info: %s" % (self.id, self.dob, self.info)
for test in (samplestr1,samplestr2,samplestr3,samplestr4,):
# could have used reduce here, but let's err on the side of explicity
persontuple = ('','','')
for data in person_dataRE.findall(test):
persontuple = merge(persontuple,data)
# make a person
person = Person(*persontuple)
# print out the collected results
print person.id
print person
print
有了这个输出:
PARI12345678
- id: PARI12345678
- dob: 10-10-2010
- info:
PARI12345678
- id: PARI12345678
- dob: 10-10-2010
- info:
- id:
- dob: 10-10-2010
- info:
PARI12345678
- id: PARI12345678
- dob:
- info: I am cool
@保罗:是还pyparsing对于Python 3? – 2010-03-04 11:08:38
@Tim:是的,当前版本包含一个pyparsing_py3模块,如果您运行的是Python 3,将会安装这个模块(这是一个良性的安装错误,我将在下一个版本中修复这个错误)。 – PaulMcG 2010-03-04 13:04:29