下面我有一些样本数据线....正则表达式的表达 - 删除匹配多个选项
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 132189
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 103461
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
300.230.100.10 - - [02/Feb/2012:12:20:55 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 28017662
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 464787
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 463747
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 434485
200.100.600.30 - - [02/Feb/2012:12:27:54 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 330269
664.387.880.60 - - [02/Feb/2012:12:32:03 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 266372
664.387.880.60 - - [02/Feb/2012:12:32:34 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 176348
我试图使用正则表达式表达找到符合以下条件的行....
- 重复的IP
- 重复时间
- 重名
然后我想删除这些行,所以只剩下一个,最后有一个文件大小稍微不同,否则我可以删除重复的行。
我已经设法检测IP http://regexr.com?2vv5c但是就我所知,任何人都可以帮忙吗?
UPDATE 在回应评论留下,如下的原始数据....
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 132189
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 103461
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
300.230.100.10 - - [02/Feb/2012:12:20:55 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 28017662
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 464787
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 463747
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 434485
200.100.600.30 - - [02/Feb/2012:12:27:54 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 330269
664.387.880.60 - - [02/Feb/2012:12:32:03 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 266372
664.387.880.60 - - [02/Feb/2012:12:32:34 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 176348
下应保持....
100.200.300.40 - - [02/Feb/2012:12:18:35 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 103461
100.200.300.40 - - [02/Feb/2012:12:19:10 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 106866
300.230.100.10 - - [02/Feb/2012:12:20:55 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.0" 200 28017662
200.100.600.30 - - [02/Feb/2012:12:27:22 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 434485
200.100.600.30 - - [02/Feb/2012:12:27:54 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 330269
664.387.880.60 - - [02/Feb/2012:12:32:03 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 266372
664.387.880.60 - - [02/Feb/2012:12:32:34 +0000] temp/newfolder/resource/newitem.pdf HTTP/1.1" 200 176348
一些例子什么应该匹配,什么不应该匹配总是有益的。 – aioobe 2012-02-09 09:57:22
你真的需要文件大小吗?如果没有,你可以很容易地删除它,然后使用'sort'和'uniq'来过滤掉重复项。 – beerbajay 2012-02-09 10:21:31