2016-01-08 161 views
0

我有一个如下所示的日志文件。如何使用logstash过滤来自log4j文件的JSON数据?

2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken 
2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully. 
2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken 
2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully. 
2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken 
2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully. 
2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken 
2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully. 
2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken 
2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully. 
2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]} 

你会发现,最后一行是由一些JSON数据 我想配置我logstash提取该JSON数据 以下是我logstash配置文件:

input { 
    file { 
    path => "C:/Users/TESTER/Desktop/files/test1.log" 
    type => "test" 
     start_position => "beginning" 
    } 
} 


filter { 
    grok { 
    match => [ "message" , "timestamp : %{DATESTAMP:timestamp}", "severity: %{WORD:severity}", "clazz: %{JAVACLASS:clazz}", "selco: %{NOTSPACE:selco}", "testerField: (?<ENQDTLS>EnquiryDetails :)"] 

     } 
} 


output { 
    elasticsearch { 
     hosts => "localhost" 
     index => "test1" 
    } 
    stdout {} 
} 

然而这是我的logstash输出:

C:\logstash-2.0.0\bin>logstash -f test1.conf 
io/console not supported; tty will not be manipulated 
Default settings used: Filter workers: 2 
Logstash startup completed 
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken 
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken 
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully. 
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken 
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully. 
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken 
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully. 
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully. 
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken 
2016-01-08T08:02:02.029Z TW 2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]} 
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully. 

有人请告诉我我在做什么错在这里。谢谢

回答

0

我找到了解决我的问题的方法。

input { 
    file { 
    path => "C:/Users/TESTER/Desktop/elk Files 8-1-2015/test1.log" 
     start_position => "beginning" 
    } 
} 


filter { 
    grok { 

    match => {"message" => "%{DATESTAMP:timestamp} %{WORD:severity} %{JAVACLASS:clazz} %{NOTSPACE:selco} (?<ENQDTLS>EnquiryDetails :) (?<JSONDATA>.*)"} 

    add_tag => [ "ENQDTLS"] 


} 

    if "ENQDTLS" not in [tags] {    
    drop { } 
    } 

    mutate { 
    remove_tag => ["ENQDTLS"] 
    } 

    json { 
     source => "JSONDATA" 
    } 

    mutate { 
    remove_field => ["timestamp"] 
    remove_field => ["clazz"] 
    remove_field => ["selco"] 
    remove_field => ["severity"] 
    remove_field => ["ENQDTLS"] 
    remove_field => ["JSONDATA"] 
    } 

} 


output { 
    elasticsearch { 
     hosts => "localhost" 
     index => "test3" 
    } 
    stdout { 
    codec => rubydebug 
    } 
} 

那么,林这里做的是过滤掉不包含关键字“EnquiryDetails”使用神交,那么我处理在该行的JSON数据的任何线。 我希望这可以帮助其他任何可能有同样问题的人。 另外,因为我是新手。想知道这是否是一个好方法。

+0

在您的示例中,大多数行不是EnquiryDetails。如果在尝试grok(etc)之前删除这些行会更有效:if [message]!〜/ EnquiryDetails/{drop {}} .... –

+0

谢谢:)这样做。 –

1

你不会说你遇到的是“错误”,但我们假设你担心输出中缺少字段。

首先,在stdout {}输出节中使用rubydebug或json编解码器。它会告诉你更多的细节。

其次,它看起来像你的grok {}都搞砸了。 grok {}将输入字段和一个或多个正则表达式应用于输入。你给它输入(“信息”),但这个正则表达式:

"timestamp : %{DATESTAMP:timestamp}" 

,因为你没有文字串不符合您输入“时间戳”。

你需要更多的东西一样:

"%{DATESTAMP} %{WORD:severity}" (etc) 

我建议设立一个神交{}节拉所有常见的信息关(一切达])。然后,使用另一个来处理不同类型的消息。

+0

谢谢阿兰,这对我有很大的帮助。我想要的是根据前面的关键字处理JSON数据。我已经解决了这个问题,并会在这里发布新的配置代码。 –