我有一个rails服务器日志文件,其格式如下。需要创建正则表达式分析rails服务器日志
Started <REQUEST_TYPE_1> <URL_1> for <IP_1> at <TIMESTAMP_1>
Processing by <controller#action_1> as <REQUEST_FORMAT_1>
Parameters: <parameters_1>
<Some logs from code>
Rendered <some_template_1> (<timetaken_1>)
Completed <RESPONSE_CODE_1> in <TIME_1>
Started <REQUEST_REQUEST_TYPE_2> <URL_2> for <IP_2> at <TIMESTAMP_2>
Processing by <controller#action_2> as <REQUEST_FORMAT_2>
Parameters: <parameters_2>
<Some logs from code>
Completed <RESPONSE_CODE_2> in <TIME_2>
现在,我需要分析该日志,并提取所有的REQUEST_TYPE
,URL
,IP
,TIMESTAMP
,REQUEST_FORMAT
,RESPONSE_CODE
从上面的日志。我努力在java/ruby中为它创建一个很好的正则表达式。实际输入中不存在<>
。我添加了可读性和屏蔽实际数据。
请求示例:
Started GET "/google.com/2" for 127.0.0.1 at Tue Dec 01 12:01:13 +0530 2015
Processing by MyController#method as JS
Parameters: {"abc" => "xyz"}
[LOG] 3 : User text log
Completed 200 OK in 26ms (Views: 3.3ms | ActiveRecord: 2.9ms)
Started POST "/google.com/543" for 127.0.1.1 at Tue Dec 01 13:13:16 +0530 2015
Processing by MyController#method_2 as JSON
Parameters: {"efg" => "uvw"}
Completed 404 Not Authorized in 65ms (Views: 1.5ms | ActiveRecord: 1.0ms)
预期输出:
request_types = ['GET', 'POST']
urls = ['/google.com/2','/google.com/543']
ips = ['127.0.0.1','127.0.1.1']
timestamps = ['Tue Dec 01 12:01:13 +0530 2015','Tue Dec 01 13:13:16 +0530 2015']
request_formats = ['JS','JSON']
response_codes = ['200 OK','404 Not Authorized']
我能写出下面的正则表达式,但预期它不工作。
request_types = /Started \w+/ //Expected array of all request types
urls = /"\/.*\/"/ //Expected array of all urls types
ips = /"d{1,3}.d{1,3}.d{1,3}.d{1,3}"/ //Expected array of all ips types
timestamps = /at \w+/
request_formats =/as \w+/
response_codes = /Completed \w+/
我希望能得到来自于JAVA/RUBY给定的输入提取这个参数来创建正则表达式的一些帮助。如果可能,我更喜欢Java。
您的原始日志文件是否也有这些括号('<>')? – Jan
没有。这只是掩盖实际数据 – Abhishek
像https://regex101.com/r/uI6oV1/3之类的东西? – Jan