分组和筛选结果

我有一个文件由管道分隔，我必须做一个字段的组，并获得其发生的总和。分组和筛选结果

我输入文件看起来像：

96472|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
96472|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
96472|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12 
214126|Text1|6|A|City|Austin, TX|0123|9899|2017-02-12

这是我怎么做：在

cut -d'|' -f1 somefile.txt | cut -d'-' -f1 | sort | uniq -c 
output is 
3 96472 
10 214126

基本上我想总结一下现场的发生，就像group by子句SQL。所以在我的例子中，我表明字段/列1有重复值为3和10

我相信有更好的方法来做到这一点。我也想过滤记录，有我发生的10个事件少：

cut -d'|' -f1 somefile.txt | cut -d'-' -f1 | sort | uniq -c | grep -v 10

是否有一个很好的方法来实现两个？

来源

2017-01-03 Bhaskar Mishra

由于只是把down输入文件进行测试 –

预期输出？ – Inian

我想要一些字段的发生，就像sql中的group by子句一样。所以在我的例子中，我表明f1具有相同的值重复10次 –

一个简单的awk逻辑可能就足够了，而不是使用其他实用程序。对于您输入的文件，输出如下;

awk -F"|" '{count[$1]++}END{for (i in count) print count[i],i}' file 
3 96472 
10 214126

的想法是count[$1]++递增的$1发生在文件中，并且一旦文件被处理时，END子句打印出每个唯一字段中的总计数中$1

另一个过滤器来列出只有那些数小于10

awk -F"|" '{count[$1]++}END{for (i in count) if (count[i] < 10){print count[i],i}}' file 
3 96472

来源

2017-01-03 12:21:15 Inian

伟大的如何显示记录的计数<10。我目前在我的例子中使用'grep -v'。 –

@BhaskarMishra：参考我的更新。当'awk'在那里时，你不需要任何其他工具，它非常强大！ – Inian

@BhaskarMishra：不要忘记给予好评/接受的答案，一旦你找到解决您的问题 – Inian

只需扩展您的命令答案：

cut -d'|' -f1 somefile.txt | cut -d'-' -f2 | sort | uniq -c | awk '{ if ($1 < 10) print $0 }'

来源

2017-01-03 12:25:04 Lino

假设你的数据在文件txt。

sort -t '|' -k 1 txt | uniq -c | awk -F"|" '{print $1}' | awk '{if($1 < 10) print $0}'

sort命令：

记号化（-t '|'）上'|'字符的数据，然后
选择第一令牌（-k 1）排序

来源

2017-01-03 12:32:29 sameerkn

分组和筛选结果

回答

相关问题