亚历克斯提到pyparsing所以这里是一个pyparsing方法你同样的问题:
from pyparsing import Word, Suppress, Regex, oneOf, SkipTo
import datetime
DASHES = Word('-').suppress()
LPAR,RPAR,AT = map(Suppress,"()@")
date = Regex(r'\d{2}/\d{2}/\d{4}')
time = Regex(r'\d{2}:\d{2}:\d{2}')
status = oneOf("Busy Available Idle Offline Unavailable")
statechange1 = 'changed status from' + status('fromstate') + 'to' + status('tostate')
statechange2 = 'became' + status('tostate')
linefmt = (DASHES + SkipTo('(')('name') + LPAR + SkipTo(RPAR)('email') + RPAR +
(statechange1 | statechange2) +
AT + date('date') + time('time') + DASHES)
def convertFields(tokens):
if 'fromstate' not in tokens:
tokens['fromstate'] = 'NULL'
tokens['name'] = tokens.name.strip()
tokens['email'] = tokens.email.strip()
d,mon,yr = map(int, tokens.date.split('/'))
h,m,s = map(int, tokens.time.split(':'))
tokens['datetime'] = datetime.datetime(yr, mon, d, h, m, s)
linefmt.setParseAction(convertFields)
for line in text.splitlines():
fields = linefmt.parseString(line)
print "%(name)s/%(email)s %(fromstate)-10.10s %(tostate)-10.10s %(datetime)s" % fields
打印:
Mark Grey/[email protected] Busy Available 2010-07-14 16:32:36
Silvia Pablo/[email protected] NULL Available 2010-07-14 16:32:39
pyparsing可以让你的名字附加到结果字段(就像命名在Tom Pietzcker的RE-styled答案中),加上解析时间操作来操作或操作已解析的操作 - 注意将单独的日期和时间字段转换为真正的日期时间对象,已经转换并准备处理,解析后没有额外的麻烦或大惊小怪。
这里是一个修饰的环,只是转储出的解析令牌和每行的命名字段:
for line in text.splitlines():
fields = linefmt.parseString(line)
print fields.dump()
打印:
['Mark Grey ', '[email protected]', 'changed status from', 'Busy', 'to', 'Available', '14/07/2010', '16:32:36']
- date: 14/07/2010
- datetime: 2010-07-14 16:32:36
- email: [email protected]
- fromstate: Busy
- name: Mark Grey
- time: 16:32:36
- tostate: Available
['Silvia Pablo ', '[email protected]', 'became', 'Available', '14/07/2010', '16:32:39']
- date: 14/07/2010
- datetime: 2010-07-14 16:32:39
- email: [email protected]
- fromstate: NULL
- name: Silvia Pablo
- time: 16:32:39
- tostate: Available
我怀疑,当你继续在这方面努力问题,您会发现输入文本格式的其他变体指定用户状态如何变化。在这种情况下,您只需添加另一个定义,如statechange1
或statechange2
,并将其与其他的插入到linefmt
中。我觉得pyparsing的解析器定义结构可以帮助开发人员在事情发生变化之后回到解析器,并轻松扩展他们的解析程序。
的NaN ........... – 2010-07-15 07:38:16
嗯,这是不是一个数字好吗:) – 2010-07-15 07:54:00
编辑注:马塞洛和Tim给你你想要做什么一个很好的答案。以下是包含在Python中的正则表达式库的文档,它可以帮助您进一步扩展代码: http://docs.python.org/library/re.html – 2010-07-15 07:51:17