IP地址清理日志

我在使用“基本？”时很不好。 unix命令，这个问题使我的知识更加考验。我想要做的是从一个日志（例如来自apache的access.log）对所有IP地址进行grep并计算它们发生的频率。我可以用一个命令来做到这一点，还是我需要为此编写一个脚本？IP地址清理日志

BR，保罗Peelen

来源

2011-04-20 Paul Peelen

看一看我在UNIX stackexchange答案：https://unix.stackexchange.com/a/389565/249079 – Ganapathy 2017-09-29 05:36:56

您至少需要一条短管道。

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

哪个会打印每个IP（只能用于ipv4），排序前加count。我使用apache2的access.log测试了它（虽然它是可配置的，所以你需要检查），并且它对我很有用。它假定IP地址是每行中的第一件事。

sed收集IP地址（实际上它寻找4组数字，其间有句点），并用它替换整个行。 -e t如果设法替换，则继续下一行，-e d删除该行（如果其上没有IP地址）。 sort sorts :) :)和uniq -c计数连续相同行的实例（这是因为我们已经对它们进行了排序，对应于总计数）。

来源

2011-04-20 18:28:28 falstro

，你可以做以下（其中数据文件是日志文件的名称）

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

编辑：错过了如何计算地址的一部分，现在又增加了

来源

2011-04-20 18:27:36

这会失败，因为egrep会打印整行包括时间戳，并且每行都是唯一的，所以您需要单独输出IP地址并删除其余行（或者在检查唯一性时仅考虑IP） – falstro 2014-01-09 07:35:47

-1

使用的sed：

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

您可以搜索并找到可用的正则表达式上Inernet IP地址和<regex_for_ip_address>更换。例如From answers to a related question on stackoverflow

来源

2011-04-20 18:55:30 sahaj

正如Dave Tarsi指出的那样，这可能会失败，它会捕获诸如有效IP地址的浏览器版本。您需要知道IP地址在哪一行（开始），并且只能选择这些行。 – falstro 2014-01-09 07:32:28

egrep'[[：digit：]] {1,3}（。[[：digit：]] {1,3}）{3}'| awk'{print $ 1}'| sort | uniq -c

来源

2013-11-04 05:53:52 Snowwolf

Dave Tarsi指出，这实际上可能会失败，它会捕获诸如浏览器版本等有效IP地址的东西。您需要知道IP地址在哪一行（开始），并且只能选择这些行。 – falstro 2014-01-09 07:33:01

以下是我几年前写的一个脚本。它从Apache访问日志中寻找地址。我刚刚尝试运行Ubuntu 11.10（oneiric）3.0.0-32-generic＃51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux 它工作正常。使用Gvim或Vim读取结果文件，这将被称为unique_visits，它将在列中列出唯一的ips。这个关键在于grep使用的行。这些表达式用于提取IP地址号码。仅限IPV4。您可能需要浏览并更新浏览器版本号。我写了Slackware的系统的另一个类似的脚本是在这里： http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh 
#eliminate search engine referals and zombie hunters. combined_log is the original file 
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search 
#now sort them to eliminate duplicates and put them in order 
sort -un search > search_sort 
#do the same with original file 
sort -un combined_log > combined_log_sort 
#now get all the ip addresses. only the numbers 
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip 
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip 
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip 
#get rid of the extra column 
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip 
#remove stuff like browser versions and system versions 
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits 

exit 0

来源

2013-11-17 16:41:35

-1

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

来源

2014-03-26 17:47:01 cint

IP地址清理日志

回答

相关问题