2017-06-15 95 views
1

嗨,我有一个日志文件,该文件的内容低于:什么是特殊文本的最佳Python正则表达式?

[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ] 
ViewPostImeInputStage processKey 0 

[ 06-15 14:07:48.397 3539: 4649 D/AudioService ] 
active stream is 0x8 

[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ] 
isSafeVolumeDialogShowing : false 

我想提取从日志文件。该预期的格式一些信息如下:

[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'), 
('06-15 14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'), 
('06-15 14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')] 

问题:提取预期格式信息的最佳python正则表达式是什么?非常感谢!

upate:我曾尝试下面的代码

import re 
regex = r"(\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s(\d+).*(\w{1})/(.*)\](.*)" 
data = [g.groups() for g in re.finditer(regex, log, re.M | re.I)] 

我已经得到的结果是

data=[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', '\r'), (
'06-15 14:07:48.397', '3539', 'D', 'AudioService', '\r'), ('06-15 14:07:48.407', 
'4277', 'D', 'vol.VolumeDialogControl.VC', '\r')] 

我不能得到最后一个元素。

+0

请提供您已经尝试代码。 – dhdavvie

+0

请先显示您的尝试。另外你需要的是格式化字符串,因为你正在使用整个字符串。 – wolfsgang

+3

最好的正则表达式是你自己编写的正则表达式,这样你就可以理解它在稍后需要调整。 –

回答

2

用下面的办法:

with open('yourlogfile', 'r') as log: 
    lines = log.read() 
    result = re.sub(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', 
        r'\1 \2 \3 \4 \5 \6 \7', lines, flags=re.MULTILINE) 

    print(result) 

输出:

06-15 14:07:48.377 15012 15012 D ViewRootImpl ViewPostImeInputStage processKey 0 
06-15 14:07:48.397 3539 4649 D AudioService active stream is 0x8 
06-15 14:07:48.407 4277 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false 

要获得结果作为匹配列表使用re.findall()功能:

... 
result = re.findall(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', lines, flags=re.MULTILINE) 
print(result) 

输出:

[('06-15', '14:07:48.377', '15012', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'), ('06-15', '14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'), ('06-15', '14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')] 
+0

感谢您的回答, 我还想问另一个问题,我如何得到如下列表的结果,我已经尝试过,但我可以没有最后一个元素。 [('06 -15 14:07:48.377','15012','D','ViewRootImpl','ViewPostImeInputStage processKey 0'),( '06 -15 14:07:48.397','3539', 'D','AudioService','active stream is 0x8'),('06 -15 14:07:48.407', '4277','D','vol.VolumeDialogControl.VC','isSafeVolumeDialogShowing:false' )] – nanci

+0

@nanci,看我的更新 – RomanPerekhrest

+0

嗨,我更新我的问题,请原谅我。 – nanci

1
#!/usr/bin/python2 
# -*- coding: utf-8 -*- 

import re 

input = """ 
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ] 
ViewPostImeInputStage processKey 0 

[ 06-15 14:07:48.397 3539: 4649 D/AudioService ] 
active stream is 0x8 

[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ] 
isSafeVolumeDialogShowing : false 
""" 

# remove carriage return 
input = re.sub('(\])\s+', '\\1 ', input) 

# replace D/Something ] -> D Something 
input = re.sub('([A-Z]{1})/([^\s]+)\s+\]\s+', '\\1 \\2 ', input) 

# remove first [ 
input = re.sub('\[\s+([0-9]{2}\-[0-9]{2})', '\\1', input) 

print input 

输出

06-15 14:07:48.377 15012:15012 D ViewRootImpl ViewPostImeInputStage processKey 0 

06-15 14:07:48.397 3539: 4649 D AudioService active stream is 0x8 

06-15 14:07:48.407 4277: 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false 
+1

嗨,zital,非常感谢您 – nanci

相关问题