2017-02-24 211 views
0

现在我正在使用“match_all”查询来获取Logstash正在处理的数据。我得到的输出是每一个属于事件一部分的字段,因为它应该是。这是我的查询:如何通过curl查询Logstash并仅返回特定字段

{ 
"query": { 
    "match_all" : { } 
}, 
    "size": 1, 
    "sort": [ 
{ 
"@timestamp": { 
    "order": "desc" 
    } 
    } 
    ] 
} 

正如你所看到的,我也排序我的结果,我总是得到最近的一个输出。

这里是我的输出的一个例子:

{ 
    "took" : 1, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 15768, 
    "max_score" : null, 
    "hits" : [ 
     { 
     "_index" : "filebeat-2017.02.24", 
     "_type" : "bro", 
     "_id" : "AVpx-pFtiEtl3Zqhg8tF", 
     "_score" : null, 
     "_source" : { 
      "resp_pkts" : 0, 
      "source" : "/usr/local/bro/logs/current/conn.log", 
      "type" : "bro", 
      "id_orig_p" : 56058, 
      "duration" : 848.388112, 
      "local_resp" : true, 
      "uid" : "CPndOf4NNf9CzTILFi", 
      "id_orig_h" : "192.168.137.130", 
      "conn_state" : "OTH", 
      "@version" : "1", 
      "beat" : { 
      "hostname" : "localhost.localdomain", 
      "name" : "localhost.localdomain", 
      "version" : "5.2.0" 
      }, 
      "host" : "localhost.localdomain", 
      "id_resp_h" : "192.168.137.141", 
      "id_resp_p" : 22, 
      "resp_ip_bytes" : 0, 
      "offset" : 115612, 
      "orig_bytes" : 32052, 
      "local_orig" : true, 
      "input_type" : "log", 
      "orig_ip_bytes" : 102980, 
      "orig_pkts" : 1364, 
      "missed_bytes" : 0, 
      "history" : "DcA", 
      "tunnel_parents" : [ ], 
      "message" : "{\"ts\":1487969779.653504,\"uid\":\"CPndOf4NNf9CzTILFi\",\"id_orig_h\":\"192.168.137.130\",\"id_orig_p\":56058,\"id_resp_h\":\"192.168.137.141\",\"id_resp_p\":22,\"proto\":\"tcp\",\"duration\":848.388112,\"orig_bytes\":32052,\"resp_bytes\":0,\"conn_state\":\"OTH\",\"local_orig\":true,\"local_resp\":true,\"missed_bytes\":0,\"history\":\"DcA\",\"orig_pkts\":1364,\"orig_ip_bytes\":102980,\"resp_pkts\":0,\"resp_ip_bytes\":0,\"tunnel_parents\":[]}", 
      "tags" : [ 
      "beats_input_codec_plain_applied" 
      ], 
      "@timestamp" : "2017-02-24T21:15:29.414Z", 
      "resp_bytes" : 0, 
      "proto" : "tcp", 
      "fields" : { 
      "sensorType" : "networksensor" 
      }, 
      "ts" : 1.487969779653504E9 
     }, 
     "sort" : [ 
      1487970929414 
     ] 
     } 
    ] 
    } 
} 

正如你可以看到,这是一个大量输出的外部应用程序(C#编写的处理,使垃圾收集大量关于所有这些字符串),我只是不需要。

我的问题是,我如何设置我的查询,以便我只抓取我需要的字段?

回答

2

对于5.x有一个更改,允许您执行_source筛选。该文档是here,它应该是这样的:

{ 
"query": { 
    "match_all" : { } 
}, 
"size": 1, 
"_source": ["a","b"], 
... 

而结果是这样的:

{ 
    "took" : 2, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 1, 
    "max_score" : 1.0, 
    "hits" : [ 
     { 
     "_index" : "xxx", 
     "_type" : "xxx", 
     "_id" : "xxx", 
     "_score" : 1.0, 
     "_source" : { 
      "a" : 1, 
      "b" : "2" 
     } 
     } 
    ] 
    } 
} 

之前的版本中5,你可以用一个领域参数做到这一点:

您查询可以在查询的根级传递,"fields": ["field1","field2"...]。它返回的格式将有所不同,但它会起作用。

{ 
"query": { 
    "match_all" : { } 
}, 
"size": 1, 
"fields": ["a","b"], 
... 

这将产生如下输出:

{ 
    "took": 9, 
    "timed_out": false, 
    "_shards": { 
    "total": 1, 
    "successful": 1, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 2077, 
    "max_score": 1, 
    "hits": [ 
     { 
     "_index": "xxx", 
     "_type": "xxx", 
     "_id": "xxxx", 
     "_score": 1, 
     "fields": { 
      "a": [ 
      0 
      ], 
      "b": [ 
      "xyz" 
      ] 
     } 
     } 
    ] 
    } 
} 

字段总是阵列(由于1.0 API)并没有任何方法来改变,由于Elasticsearch固有辑阵值感知。

+0

运行5.2,我其实从得到一个错误: 'code' { “错误”:{ “ROOT_CAUSE”:[{ “类型”: “parsing_exception”, “理由”:“其如果未存储字段,请使用[stored_fields]检索存储的字段或_source筛选“, ”line“:6, ”col“:13 } ”status“: 400 }'code' – BenjaFriend

+0

你尝试过使用'stored_fields'而不是'fields'(我没有意识到5.x api的改变) – Alcanzar

+0

我做了我只是得到没有字段的输出 – BenjaFriend