2012-10-04 46 views
0

在我试图给grep geoIs.My问题的所有模式是如何列出的geoIs的不同值与它一起算Linux的搜索模式,并打印其计数

前执行以下操作:

GeoIs:"Paramount","sumthing else" 
GeoIs:"undefined","sumthing else" 
GeoIs:"undefined","sumthing else" 
GeoIs:"178","sumthing else" 
GeoIs:"178","sumthing else" 
and many more 
... 
... 

结果预期:

GeoIs:"Paramount" 1 
GeoIs:"undefined" 2 
GeoIs:"178" 2 

命令

zcat file.gz | grep -P '"geoIs":".*?.undefined*?"' | sort -u -T.|wc -l

EDIT1:

GEOIS找到诠释他下面的字符串

012-10-02 09:32:45{"e":{"ec":100001,"st":1349170352455,"bd":"Mozilla%2F5.0%20(Windows%20NT%206.1)%20AppleWebKit%2F537.4%20(KHTMf01f02008592~rt%2366.657~rv%2366.228~as%2317~st%231349170293955~cat%231349170352431~sp%23as~c%2334~pat%231349128562942","smplCookie":"undefined","geoIPAddress":"122.107.154.58","geoCountry":"australia","geoCity":"Vermont","geoRegion":"Victoria","geoPostalCode":"undefined","geoLatitude":"undefined","geoLongitude":"undefined","geoMetro":"0","geoArea":"0","geoIs"}} 

回答

3

要返回的频率表,使用

sort | uniq -c | sort -n 

为您提供的样本数据,我会使用

zcat file.gz | cut -f1 -d, | sort | uniq -c | sort -n 

zcat file.gz | grep -o '"searchstring":"[^"]*"'| sort | uniq -c | sort -n 
+0

可以向我展示完整的命令 – Rajeev

+0

@Rajeev:查看更新。 – choroba

+0

但抱歉,它并不总是第一个field.it是分布在..将编辑输入的问题。请看看 – Rajeev

1

awk替代方案:

awk -F, '{a[$1]++;}END{for(x in a)if(x)print x,a[x]}' file 


kent$ echo 'GeoIsp:"Paramount","sumthing else" 
GeoIsp:"undefined","sumthing else" 
GeoIsp:"undefined","sumthing else" 
GeoIsp:"178","sumthing else" 
GeoIsp:"178","sumthing else" 
'|awk -F, '{a[$1]++;}END{for(x in a)if(x)print x,a[x]}' 
GeoIsp:"Paramount" 1 
GeoIsp:"undefined" 2 
GeoIsp:"178" 2