2011-05-24 168 views
0

我有一个程序(遗憾地改变这个不是一个选项),它输出的日志文件大于500k行。Shell:通过子串对字符串进行分组的脚本

我想组中的日志文件一起行(然后排序这些群体)的基础上的子带中的台词

比如我有类似下面几行:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something; 

什么即时寻找到组上是TIM BETWEEN '*' AND '*'其中*线之间相匹配,例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

将在输出被分组为例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

每个组也都已经根据整个字符串进行了排序,所以在“多少”类似的情况下,它们是否相邻?

我一直在试图把一个shell脚本放在一起输出我想从日志文件中读取的内容,但没有取得任何成功!

编辑:我还需要提及的是 '东西' 可以是多个字,例如:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3 

回答

1

你或许应该能够使用排序

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile 

凡+1 - 2显示“something”列,+4 -5显示第一个日期列,+6 -7显示最后一个日期列。

(PS!未测试)

+0

感谢Kristofer的答案,但我不能依靠列的数量和TIM BETWEEN'*'和'*'块的位置在行之间的相同位置,我编辑了原始问题以反映此 – Tristan 2011-05-24 09:26:44

+0

您可以将“分隔符”设置为除空格以外的其他值,以定义列结束的内容。通过这样做,您可能可以执行多步排序,在其中更改每种排序之间的分隔符(如果可以使用单词作为分隔符)。 -t 更改分隔符。 – Kristofer 2011-05-24 10:39:34

0

你必须预先筛选数据,并把它变成东西,你可以使用sort用。

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \ 
| sort -t"|" +1 -2 +2 -3 \ 
| sed 's/|/BETWEEN/;s/|/AND/' 

输出

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

我希望这有助于。