通过首先对sort
进行排序,你可以申请uniq
。
这似乎文件就好了排序:
$ cat test.csv
[email protected],2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
$ sort test.csv
[email protected],2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
[email protected],2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
$ sort test.csv | uniq
[email protected],2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
[email protected],2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
你也可以做一些AWK魔术:
$ awk -F, '{ lines[$1] = $0 } END { for (l in lines) print lines[l] }' test.csv
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
[email protected],2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
如果该列中包含逗号本身(带引号) – user775187 2011-06-17 10:18:56
这是不行的唯一的事情是排序不会给你一个计数...我认为.. – Rodo 2014-01-14 11:00:51
为什么你需要,1在-k1,1?为什么不只是-k1? – 2014-11-24 20:10:28