2010-02-02 52 views
5

我希望能够利用“的grep”或“pcregrep -M”一样的解决方案,分析适合下列参数的日志文件:解析多可变长度的日志文件

  • 每个日志条目长度可以是多行
  • 日志条目的第一行,我想搜索
  • 在一个以上的线上的每个键出现

所以下面我希望回到例子中的关键每一行h作为KEY1和它下面的所有支持行,直到下一条日志消息。

 
Log file: 
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext 
     blah 
     blah2 T 
     blah3 T 
     blah4 F 
     blah5 F 
     blah6 
     blah7 
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse 
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest 
this is a test 
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry 
keeps on going 
but not as long as before 
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing 
test test test 
end of key2 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
okay enough 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on 
and on 
 
Desired output of searching for KEY1: 
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext 
     blah 
     blah2 T 
     blah3 T 
     blah4 F 
     blah5 F 
     blah6 
     blah7 
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse 

01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry 
keeps on going 
but not as long as before 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
okay enough 

我试图做类似: '(。* \ n)的KEY1 +'
pcregrep -M日志文件
但绝对不工作的权利。

+0

什么定义了条目的结尾?是否保证条目中的行不会以数字开头,但是定义新条目的行将会是? – 2010-02-02 06:00:47

+0

使用小脚本而不是正则表达式可能更容易。任何理由不这样做? – 2010-02-02 06:02:43

回答

-1

添加到ghostdog74的答案(非常感谢你顺便说一句,它的伟大工程)

现在以“./parse file key”形式输入命令行并处理ERROR的日志级别以及DEBUG

 
#!/bin/bash 
awk -vkey="$2" ' 
$0~/DEBUG|ERROR/ && $0 !~key{f=0} 
$0~key{ f=1 } 
f{print} ' $1 
+2

所以考虑接受答案,你可以发布在你的问题,而不是 – ghostdog74 2010-02-02 07:20:29

+0

我会,但它说我不能接受我自己的答案2天 – Urgo 2010-02-02 07:30:10

+0

Urgo,这篇文章只调整了ghostdog74的答案。您应该将ghostdog74标记为答案并编辑您的原始问题以添加此调整。 – adam 2015-04-27 01:34:56

7

,如果你是在* nix,你可以使用shell

#!/bin/bash 
read -p "Enter key: " key 
awk -vkey="$key" ' 
$0~/DEBUG/ && $0 !~key{f=0} 
$0~key{ f=1 } 
f{print} ' file 

输出

$ cat file 
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext 
     blah          
     blah2 T          
     blah3 T          
     blah4 F          
     blah5 F          
     blah6          
     blah7          
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse 
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest 
this is a test          
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry      
keeps on going           
but not as long as before        
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing  
test test test           
end of key2            
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going 
and going               
and going               
and going               
and going               
and going               
and going               
and going               
and going               
and going               
and going 
and going 
and going 
okay enough 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on 
and on 

$ ./shell.sh 
Enter key: KEY1 
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext 
     blah 
     blah2 T 
     blah3 T 
     blah4 F 
     blah5 F 
     blah6 
     blah7 
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse 
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry 
keeps on going 
but not as long as before 
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
and going 
okay enough 
0

我有类似的要求,并决定编写一个小工具(在.net中)为我解析日志文件并将结果写入标准输出。

也许你觉得它有用。在Windows和Linux(单声道)工程

在这里看到:https://github.com/iohn2000/ParLog

的工具来过滤日志包含特定(正则表达式)模式的日志条目文件。也适用于多行日志条目。 例如:仅显示来自特定工作流实例的日志条目。 将结果写入标准输出。使用 '>' 重定向到文件

默认startPattern是:

^[0-9]{2} [\w]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} 

此对应的日期格式:如:2017年2月4日15:02:50778个

参数为:

f:wildcard  a file name or wildcard for multiple files 
p:pattern  the regex pattern to filter the file(s) 
s:startPattern regex pattern to define when a new log entry starts 

实施例:

ParLog.exe -f=*.log -p=findMe