将weburl的内容编入elasticsearch/kibana

我已经使用美丽的汤+ python报废了500多个网站的链接/子链接，现在我期待在elasticsearch中索引此网址的所有内容/文本，是否有任何工具这可以帮助我直接使用弹性搜索/ kibana堆栈进行索引。将weburl的内容编入elasticsearch/kibana

请帮我指点，我试着在谷歌搜索和发现logstash，但似乎它适用于单个网址。

来源

2017-03-06 Anand

我想我可以请尝试以下链接以供参考： http://stackoverflow.com/questions/13647406/how-to-index-dump-of-html-files-to-elasticsearch :) – Anand

或者，您可以添加一个监听输出的logstash代理你的履带和喂食弹性。 – Adonis

可以请你提供样品参考代码来做到这一点？ – Anand

有关Logstash参考，请参阅：https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html

否则，把你的履带式输出到一个文件中，每个网址一行的例子，你可以有以下logstash配置，在这个例子中，logstash会读一行作为消息并将其发送给host1和host2上的弹性服务器。

input { 
    file { 
     path => "/an/absolute/path" #The path has to be absolute 
     start_position => beginning 
    } 
} 

output { 
    elasticsearch{ 
     hosts => ["host1:port1", "host2:port2"] #most of the time the host being the DNS name (localhost as the most basic one), the port is 9200 
     index => "my_crawler_urls" 
     workers => 4 #to define depending on your available resources/expected performance 
    } 
}

现在当然，你可能想要做一些过滤器，后处理您的履带式的输出，并为Logstash给你的可能性与codecs和/或filters

来源

2017-03-07 12:31:33 Adonis

将weburl的内容编入elasticsearch/kibana

回答

相关问题