_http_request_failure in logstash while using http_poller


(Navneet Mathpal) #1

Hi ,

I am getting some json documents from http_poller

input {
http_poller {
urls => {
"localhost" => "http://localhost:9200"
}
interval => 10

After I reads all the document It shows _http_request_failure

1 . Is it because of logstash pinging the URL in every 10 sec and if it does not get any new doc over there it will show this _http_request_failure ?
2. If any new doc get updated in URL will http_poller will be able to take it in real time ? without taking the older docs ?

Thanks


(Magnus Bäck) #2

Not sure what you mean by older and newer docs. http://localhost:9200 will only return Elasticsearch's rarely-changing status document and the http_poller just makes an HTTP request and passes the results to Logstash.


(Navneet Mathpal) #3

I mean , if I have 200 json docs availble in my url "www.examplecom/jsonfile".
After running the http_poller I will get 200 json docs , what if one new doc get updated , will http_poller only take that new updated doc or it will take all the 201 docs again ?

why this _http_request_failure comes ?


(Magnus Bäck) #4

Okay, so http://localhost:9200 was just a randomly picked URL? That was not obvious.

After running the http_poller I will get 200 json docs , what if one new doc get updated , will http_poller only take that new updated doc or it will take all the 201 docs again ?

http_poller does not maintain any state, i.e. it has no idea of what documents are new.

why this httprequest_failure comes ?

The resulting event's @metadata field (normally not emitted by outputs but available with e.g. stdout { codec => rubydebug }) should contain details about the failure.


(Navneet Mathpal) #5

Thank you @magnusbaeck

So we can handle the repetitive doc using document_id.

1 .If I have 1m json records available in my URL , AND the url is referencing in every 10 sec .(and let us suppose http_poller reading 100 doc/sec ) its means that htttp_poller can never read the whole file ?


(Magnus Bäck) #6

Your architecture doesn't sound very sustainable, at least not with the update frequency you have in mind. Those 1 million documents have to weigh, what, 100 MB or more? You really want to do some kind of incremental polling ("give me everything that's been updated since X") or that the origin system sends updates to a broker that you can read from.


(system) #7