Http_poller 's weird behavior with polling URLs

http_poller with logstash 1.5.1 , ES 1.5.2, on Java 7

I have configured http_poller to poll the following endpoint
http://:10710/jolokia/read/:type=Registrar/InstanceName
(I am using jolokia to access the JMX application endpoint). I am trying to use http_poller to monitor the state of applications running on various machines.

I also am using a perfstat endpoint to monitor the state of a server as well and also monitoring a es cluster.

right now my http_poller config is very simple and straight forward :

input {
  http_poller {
    urls => {
      poll_point1 => "http://<host>:10710/jolokia/read/<classpath>:type=Registrar/InstanceName"
      poll_point2 => "http://<host>:8080/service-name/service_endpoint/ping"
      poll_point3 => "http:/es.cluster.address/"
      poll_point4 => "http://<host.address.url>:1055/xstatus"
   }
    connect_timeout => 15
    request_timeout => 120
    interval => 60
    socket_timeout => 60
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
# Use stdout in debug mode again to see what logstash makes of the event.
  stdout {
    codec => "rubydebug"
  }
}

With this configuration I get the following behavior/response :

every other response for the poll_point1, poll_point2 is a failure implying,of course
So the response is like the following :
Success - Failure - Success - Failure - Success - Failure ......
where Success = a response code of 200 and The Failure is like :

"error": ":10710 failed to respond",
or
"error": ":8080 failed to respond",

Its always a Success for poll_point3 and poll_point4

However, if I were to run a curl command like so :

while true; do curl -XGET "http://<host>:10710/jolokia/read/<class.name>:type=Registrar/InstanceName"; echo "\n"; sleep 30; done

the response is always a Success (a response code of 200)

My question is, why does the response alternate between Success and Failure for the two poll points (one for an apache Tomcat server endpoint and the other for a Jolokia, JMX endpoint)

Thanks

Ramdev

That is really bizarre. To be honest I don't have any strong idea of what's going on here. Can you please post the full error events?

A couple things to try:

  1. Try setting "keepalive" => false .
  2. Always set the request_timeout to be < interval (though I doubt that's the issue here). If it's longer you could theoretically exhaust the threadpool if requests are slow.

Hi Andrew:
Thanks for responding....
I did try it out with the following config parameters:

I am unable to completely post the config and the debug output: so here is the GISt of it :smile:

(direct link : https://gist.github.com/agastya71/9097d9f494e79dea4095 )

let me know what you can make off it.

Thanks much

Ramdev

Ramdev, it would be more useful to see the full error reported in the error events, not the debug logs. Can you post some sample error events here?

HI Andrew:
How do I capture the errors ? I typically have been capturing the output of logstash to figure out the errors. I am not even sure of how to configure logstash to show me the actual errors....

Ramdev

So this is what I found and rather by accident...
If I reduce the interval value to 10 seconds, it seems to be working great... no failures at all...
I guess, perhaps, the keepalive variable value is not being honored ?

for now, I have to scan every 10 seconds for the polling to show success... (even tho the service never actually fails, the poll/scan time has to be 10 seconds for the poll to work every time)

Let me know if you find anything not right with this

Thanks

Ramdev

Hmm, we haven't received any other reports of this. Which webserver are you using? Are you sure there aren't any weird quota or other issues broken here?

keepalive is just based on ApacheCommons HTTP, so I doubt there's a bug there, though it's possible. Are you not seeing the correct headers on the server when using it?

hello, I just wanted to mention that I have exactly the same problem.
I wanted to test a weather ws. Not knowing http_poller I started with "exec " which launches a curl .
This works great , every time.
I thought that http_poller should be more native and cleaner.
But in this case, I have an error exactly 1 times 2 ! ( interval = > 1200)
how a call can impact the next one, half an hour after ?

If by cons I descend the interval below 10 , it seems to work .

logstash 2.0.0 , ES 2.0, java 1.8.0_65-b17 on windows 8 (Fr) running in a powershell

I attached the config ( interval = > 1200) .
I change the API key (it's mine) , you can have one on their website for free if you want to test.

input {
  http_poller {
    urls => {
      test1 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        url => "http://api.openweathermap.org/data/2.5/weather?q=leforest,fr&mode=json&units=metric&lang=fr&APPID=643c5fe160b615a_FAKE_442cb6bb0325"
        headers => {
          Accept => "application/json"
        }
		automatic_retries => 3
		keepalive => false
        # auth => {
          # user => "AzureDiamond"
          # password => "hunter2"
        # }
      }
    }
    request_timeout => 500
    interval => 1200
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
elasticsearch {
	index => "meteo-poller-%{+YYYY.MM.dd}"
}
}

yeah This is still a problem. Off late I have started seeing other issues
as well. I keep getting a 400 error when trying to persist the response of
a health check)_heath endpoint) on an ES instance. If I remove the health
check URL , everything goes back to normal.

Ramdev

@Ari_dev @Ramdev_Wudali thanks for your reports. I'll try to find some time in the next week to troubleshoot this. Apologies for the bug. @Ari_dev thanks for posting your config. Which version of Elasticsearch is this pointing at?

When this bug was first posted there was no reproducible test case, but now that there is I hope to be able to fix it :smile:

Hello, thx for looking at it.
My elasticsearch is a 2.0.0.
regards

I've moved this to the issue here: https://github.com/logstash-plugins/logstash-input-http_poller/issues/39

@Ari_dev @Ramdev_Wudali I'm having trouble reproducing this behavior. Can you post the EXACT errors using --debug in logstash? You may have to put a log4j.properties file in there using the config below to see the apache commons HTTP client debug info

# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

I will get something to you as soon as possible...

Ramdev

Thanks!

Hello, can't have apache log since i point to an external api on http://api.openweathermap.org/

i ran .\logstash.bat --debug -f ..\meteo_poller.conf > c:\temp\debug.log
you'll find in the debug.log . 2 correct call and 2 failed.

Again, i scramble my API key but you can have one for free on their website.

For information i try to reduce the interval => 200 to interval => 20 (didn' want to wait too long :smile: )
but in this case : no error !?
I didn't try all value but with interval => 60 the bug occurs again (time to time, not 1/2).

Hum my file is too big ... put it in a pastebin http://pastebin.com/21TTtZf3

by some test: at work, through a proxy, the bug seems more difficult to reproduce. at home, with a direct connection, the bug is present.
cache or connection "keep alive" issue ?

kibana discover :
Time http_request_failure.error http_poller_metadata.response_message http_poller_metadata.response_headers.x-source
November 20th 2015, 22:04:36.666 - OK redis
November 20th 2015, 22:02:36.575 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 22:00:37.373 - OK redis
November 20th 2015, 21:58:36.584 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 21:56:37.264 - OK redis
November 20th 2015, 21:54:36.574 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 21:52:37.311 - OK redis

Hi Andrew:
Sorry for the delay, here is the gist of the errors/debug from logstash
and the config file I used.

As you can see, every other call to certain URLs fail. (the first call
succeeds and then it fails and then it succeeds and then it fails and so
on....)
The failure is indicated by the http_failure (ignore the
json_parsefailures)

let me know if you need more

Ramdev

OK! So the good news here is I found the bug!

There are three separate things going on here:

  1. api.openweathermap.org is broken and does not properly support KeepAlive ( I suggest you file a bug).
  2. Your config specifies keepalive as a per-url setting which is incorrect and does nothing ( it must be at the top level)
  3. Your config specifies automatic_retries but also as a per-url setting which is also incorrect (it does nothing).

If you move those settings to the top level everything should work.

This is very confusing however, and hardly your fault for the mixup. I've opened this github issue to fix the parameter validation: https://github.com/logstash-plugins/logstash-input-http_poller/issues/40

This issue is probably similar to what @Ari_dev was seeing. I recommend you disable keepalive.