Http_poller 's weird behavior with polling URLs

Ramdev_Wudali · September 3, 2015, 2:37pm

http_poller with logstash 1.5.1 , ES 1.5.2, on Java 7

I have configured http_poller to poll the following endpoint
http://:10710/jolokia/read/:type=Registrar/InstanceName
(I am using jolokia to access the JMX application endpoint). I am trying to use http_poller to monitor the state of applications running on various machines.

I also am using a perfstat endpoint to monitor the state of a server as well and also monitoring a es cluster.

right now my http_poller config is very simple and straight forward :

input {
  http_poller {
    urls => {
      poll_point1 => "http://<host>:10710/jolokia/read/<classpath>:type=Registrar/InstanceName"
      poll_point2 => "http://<host>:8080/service-name/service_endpoint/ping"
      poll_point3 => "http:/es.cluster.address/"
      poll_point4 => "http://<host.address.url>:1055/xstatus"
   }
    connect_timeout => 15
    request_timeout => 120
    interval => 60
    socket_timeout => 60
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
# Use stdout in debug mode again to see what logstash makes of the event.
  stdout {
    codec => "rubydebug"
  }
}

With this configuration I get the following behavior/response :

every other response for the poll_point1, poll_point2 is a failure implying,of course
So the response is like the following :
Success - Failure - Success - Failure - Success - Failure ......
where Success = a response code of 200 and The Failure is like :

"error": ":10710 failed to respond",
or
"error": ":8080 failed to respond",

Its always a Success for poll_point3 and poll_point4

However, if I were to run a curl command like so :

while true; do curl -XGET "http://<host>:10710/jolokia/read/<class.name>:type=Registrar/InstanceName"; echo "\n"; sleep 30; done

the response is always a Success (a response code of 200)

My question is, why does the response alternate between Success and Failure for the two poll points (one for an apache Tomcat server endpoint and the other for a Jolokia, JMX endpoint)

Thanks

Ramdev

andrewvc · September 14, 2015, 3:11pm

That is really bizarre. To be honest I don't have any strong idea of what's going on here. Can you please post the full error events?

A couple things to try:

Try setting "keepalive" => false .
Always set the request_timeout to be < interval (though I doubt that's the issue here). If it's longer you could theoretically exhaust the threadpool if requests are slow.

Ramdev_Wudali · September 14, 2015, 4:20pm

Hi Andrew:
Thanks for responding....
I did try it out with the following config parameters:

I am unable to completely post the config and the debug output: so here is the GISt of it

gist.github.com

https://gist.github.com/agastya71/9097d9f494e79dea4095

debug output.txt

Reading config file {:file=>"logstash/agent.rb", :level=>:debug, :line=>"301", :method=>"local_config"}
Compiled pipeline code:
        @inputs = []
        @filters = []
        @outputs = []
        @periodic_flushers = []
        @shutdown_flushers = []

          @input_http_poller_1 = plugin("input", "http_poller", LogStash::Util.hash_merge_many({ "urls" => {("url_to_test") => {("url") => ("http://app.server.name:10710/jolokia/read/example.package.name:type=Registrar/InstanceName"), ("keepalive") => ("false"), ("method") => ("get"), ("cookies") => ("false"), ("automatic_retries") => 1}} }, { "connect_timeout" => 15 }, { "request_timeout" => 120 }, { "interval" => 60 }, { "socket_timeout" => 60 }, { "codec" => ("json") }, { "metadata_target" => ("http_poller_metadata") }))

This file has been truncated. show original

http_poller.conf

input {
  http_poller {
    urls => {
      url_to_test => {
          url => "http:/app.server.name:10710/jolokia/read/example.package.name:type=Registrar/InstanceName"
          keepalive => false
          method => get
          cookies => false
          automatic_retries => 1
      }

This file has been truncated. show original

(direct link : https://gist.github.com/agastya71/9097d9f494e79dea4095 )

let me know what you can make off it.

Thanks much

Ramdev

andrewvc · September 14, 2015, 4:37pm

Ramdev, it would be more useful to see the full error reported in the error events, not the debug logs. Can you post some sample error events here?

Ramdev_Wudali · September 14, 2015, 4:40pm

HI Andrew:
How do I capture the errors ? I typically have been capturing the output of logstash to figure out the errors. I am not even sure of how to configure logstash to show me the actual errors....

Ramdev

Ramdev_Wudali · September 14, 2015, 8:56pm

So this is what I found and rather by accident...
If I reduce the interval value to 10 seconds, it seems to be working great... no failures at all...
I guess, perhaps, the keepalive variable value is not being honored ?

for now, I have to scan every 10 seconds for the polling to show success... (even tho the service never actually fails, the poll/scan time has to be 10 seconds for the poll to work every time)

Let me know if you find anything not right with this

Thanks

Ramdev

andrewvc · September 29, 2015, 12:47pm

Hmm, we haven't received any other reports of this. Which webserver are you using? Are you sure there aren't any weird quota or other issues broken here?

keepalive is just based on ApacheCommons HTTP, so I doubt there's a bug there, though it's possible. Are you not seeing the correct headers on the server when using it?

Ari_dev · November 11, 2015, 6:03am

hello, I just wanted to mention that I have exactly the same problem.
I wanted to test a weather ws. Not knowing http_poller I started with "exec " which launches a curl .
This works great , every time.
I thought that http_poller should be more native and cleaner.
But in this case, I have an error exactly 1 times 2 ! ( interval = > 1200)
how a call can impact the next one, half an hour after ?

If by cons I descend the interval below 10 , it seems to work .

logstash 2.0.0 , ES 2.0, java 1.8.0_65-b17 on windows 8 (Fr) running in a powershell

I attached the config ( interval = > 1200) .
I change the API key (it's mine) , you can have one on their website for free if you want to test.

input {
  http_poller {
    urls => {
      test1 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        url => "http://api.openweathermap.org/data/2.5/weather?q=leforest,fr&mode=json&units=metric&lang=fr&APPID=643c5fe160b615a_FAKE_442cb6bb0325"
        headers => {
          Accept => "application/json"
        }
		automatic_retries => 3
		keepalive => false
        # auth => {
          # user => "AzureDiamond"
          # password => "hunter2"
        # }
      }
    }
    request_timeout => 500
    interval => 1200
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
elasticsearch {
	index => "meteo-poller-%{+YYYY.MM.dd}"
}
}

Ramdev_Wudali · November 11, 2015, 12:43pm

yeah This is still a problem. Off late I have started seeing other issues
as well. I keep getting a 400 error when trying to persist the response of
a health check)_heath endpoint) on an ES instance. If I remove the health
check URL , everything goes back to normal.

Ramdev

andrewvc · November 11, 2015, 5:09pm

@Ari_dev @Ramdev_Wudali thanks for your reports. I'll try to find some time in the next week to troubleshoot this. Apologies for the bug. @Ari_dev thanks for posting your config. Which version of Elasticsearch is this pointing at?

When this bug was first posted there was no reproducible test case, but now that there is I hope to be able to fix it

Ari_dev · November 11, 2015, 5:14pm

Hello, thx for looking at it.
My elasticsearch is a 2.0.0.
regards

andrewvc · November 16, 2015, 9:28pm

I've moved this to the issue here: https://github.com/logstash-plugins/logstash-input-http_poller/issues/39

Andrew_Cholakian1 · November 19, 2015, 6:06pm

@Ari_dev @Ramdev_Wudali I'm having trouble reproducing this behavior. Can you post the EXACT errors using --debug in logstash? You may have to put a log4j.properties file in there using the config below to see the apache commons HTTP client debug info

# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

Ramdev_Wudali · November 19, 2015, 7:00pm

I will get something to you as soon as possible...

Ramdev

Andrew_Cholakian1 · November 19, 2015, 8:45pm

Thanks!

Ari_dev · November 19, 2015, 9:25pm

Hello, can't have apache log since i point to an external api on http://api.openweathermap.org/

i ran .\logstash.bat --debug -f ..\meteo_poller.conf > c:\temp\debug.log
you'll find in the debug.log . 2 correct call and 2 failed.

Again, i scramble my API key but you can have one for free on their website.

For information i try to reduce the interval => 200 to interval => 20 (didn' want to wait too long )
but in this case : no error !?
I didn't try all value but with interval => 60 the bug occurs again (time to time, not 1/2).

Hum my file is too big ... put it in a pastebin http://pastebin.com/21TTtZf3

Ari_dev · November 20, 2015, 9:16pm

by some test: at work, through a proxy, the bug seems more difficult to reproduce. at home, with a direct connection, the bug is present.
cache or connection "keep alive" issue ?

kibana discover :
Time http_request_failure.error http_poller_metadata.response_message http_poller_metadata.response_headers.x-source
November 20th 2015, 22:04:36.666 - OK redis
November 20th 2015, 22:02:36.575 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 22:00:37.373 - OK redis
November 20th 2015, 21:58:36.584 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 21:56:37.264 - OK redis
November 20th 2015, 21:54:36.574 api.openweathermap.org:80 failed to respond - -
November 20th 2015, 21:52:37.311 - OK redis

Ramdev_Wudali · November 23, 2015, 2:40pm

Hi Andrew:
Sorry for the delay, here is the gist of the errors/debug from logstash
and the config file I used.

gist.github.com

https://gist.github.com/agastya71/d07819f2fd9adeb122ce

configuration for http_polling

#   THE URLs are not exposed to the outside world so you might not be able to use them... 
#    the problem causing URLs are : a tomcat webservice app.
#    a JMX endpoint to veify a process is up and running.
#

input {
  http_poller {
    urls => {
      sdp_sane_service => "http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/ping"
      sdp_sane_db => "http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/db"

This file has been truncated. show original

http_poller errors

{:timestamp=>"2015-11-23T07:46:12.519000-0600", :message=>"Reading config file", :file=>"logstash/agent.rb", :level=>:debug, :line=>"301", :method=>"local_config"}
{:timestamp=>"2015-11-23T07:46:12.917000-0600", :message=>"Compiled pipeline code:\n        @inputs = []\n        @filters = []\n        @outputs = []\n        @periodic_flushers = []\n        @shutdown_flushers = []\n\n          @input_http_poller_1 = plugin(\"input\", \"http_poller\", LogStash::Util.hash_merge_many({ \"urls\" => {(\"sdp_sane_service\") => (\"http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/ping\"), (\"sdp_sane_db\") => (\"http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/db\"), (\"sdp_copro\") => (\"http://staton-10.int.westgroup.com:10510/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\"), (\"sdp_JMSconsumer\") => (\"http://staton-11.int.westgroup.com:10910/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\"), (\"sdp_esconsumer\") => (\"http://sdp-decahose-dev-8.amers1.cis.trcloud:10710/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\"), (\"es_blip\") => (\"http://blip.sdp.int.thomsonreuters.com/\"), (\"es_logging\") => (\"http://logging.sdp.int.thomsonreuters.com/\"), (\"es_tribe_node\") => (\"http://search.sdp.int.thomsonreuters.com/\"), (\"porto_state\") => {(\"url\") => (\"http://porto.int.thomsonreuters.com:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"gnosis_state\") => {(\"url\") => (\"http://porto.int.thomsonreuters.com:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"olimp_state\") => {(\"url\") => (\"http://porto.int.thomsonreuters.com:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_5_state\") => {(\"url\") => (\"http://sdp-decahose-dev-5.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_11_state\") => {(\"url\") => (\"http://sdp-decahose-dev-11.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_12_state\") => {(\"url\") => (\"http://sdp-decahose-dev-12.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_1_state\") => {(\"url\") => (\"http://sdp-decahose-dev-1.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_2_state\") => {(\"url\") => (\"http://sdp-decahose-dev-2.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_3_state\") => {(\"url\") => (\"http://sdp-decahose-dev-3.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sdp_decahose_dev_4_state\") => {(\"url\") => (\"http://sdp-decahose-dev-4.amers1.cis.trcloud:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sanji_04_state\") => {(\"url\") => (\"http://sanji-04.int.westgroup.com:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"sanji_05_state\") => {(\"url\") => (\"http://sanji-05.int.westgroup.com:1055/xstatus\"), (\"codec\") => (\"line\")}, (\"pavlik_11_state\") => {(\"url\") => (\"http://pavlik-11.int.westgroup.com:1055/xstatus\"), (\"codec\") => (\"line\")}} }, { \"connect_timeout\" => 15 }, { \"request_timeout\" => 120 }, { \"interval\" => 60 }, { \"socket_timeout\" => 60 }, { \"metadata_target\" => (\"http_poller_metadata\") }))\n\n          @inputs << @input_http_poller_1\n\n          @output_stdout_2 = plugin(\"output\", \"stdout\", LogStash::Util.hash_merge_many({ \"codec\" => (\"rubydebug\") }))\n\n          @outputs << @output_stdout_2\n\n  def filter_func(event)\n    events = [event]\n    @logger.debug? && @logger.debug(\"filter received\", :event => event.to_hash)\n    events\n  end\n  def output_func(event)\n    @logger.debug? && @logger.debug(\"output received\", :event => event.to_hash)\n    @output_stdout_2.handle(event)\n    \n  end", :level=>:debug, :file=>"logstash/pipeline.rb", :line=>"28", :method=>"initialize"}
{:timestamp=>"2015-11-23T07:46:12.923000-0600", :message=>"Plugin not defined in namespace, checking for plugin file", :type=>"input", :name=>"http_poller", :path=>"logstash/inputs/http_poller", :level=>:debug, :file=>"logstash/plugin.rb", :line=>"133", :method=>"lookup"}
{:timestamp=>"2015-11-23T07:46:13.632000-0600", :message=>"Plugin not defined in namespace, checking for plugin file", :type=>"codec", :name=>"json", :path=>"logstash/codecs/json", :level=>:debug, :file=>"logstash/plugin.rb", :line=>"133", :method=>"lookup"}
{:timestamp=>"2015-11-23T07:46:13.642000-0600", :message=>"Using version 0.1.x codec plugin 'json'. This plugin isn't well supported by the community and likely has no maintainer.", :level=>:info, :file=>"logstash/config/mixin.rb", :line=>"223", :method=>"print_version_notice"}
{:timestamp=>"2015-11-23T07:46:13.644000-0600", :message=>"config LogStash::Codecs::JSON/@charset = \"UTF-8\"", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}
{:timestamp=>"2015-11-23T07:46:13.650000-0600", :message=>"config LogStash::Inputs::HTTP_Poller/@urls = {\"sdp_sane_service\"=>\"http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/ping\", \"sdp_sane_db\"=>\"http://staton-09.int.westgroup.com:8080/monitor-sa/rest/sdp-sane/cluster/db\", \"sdp_copro\"=>\"http://staton-10.int.westgroup.com:10510/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\", \"sdp_JMSconsumer\"=>\"http://staton-11.int.westgroup.com:10910/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\", \"sdp_esconsumer\"=>\"http://sdp-decahose-dev-8.amers1.cis.trcloud:10710/jolokia/read/com.trgr.platform.riptide:type=Registrar/InstanceName\", \"es_blip\"=>\"http://blip.sdp.int.thomsonreuters.com/\", \"es_logging\"=>\"http://logging.sdp.int.thomsonreuters.com/\", \"es_tribe_node\"=>\"http://search.sdp.int.thomsonreuters.com/\", \"porto_state\"=>{\"url\"=>\"http://porto.int.thomsonreuters.com:1055/xstatus\", \"codec\"=>\"line\"}, \"gnosis_state\"=>{\"url\"=>\"http://porto.int.thomsonreuters.com:1055/xstatus\", \"codec\"=>\"line\"}, \"olimp_state\"=>{\"url\"=>\"http://porto.int.thomsonreuters.com:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_5_state\"=>{\"url\"=>\"http://sdp-decahose-dev-5.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_11_state\"=>{\"url\"=>\"http://sdp-decahose-dev-11.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_12_state\"=>{\"url\"=>\"http://sdp-decahose-dev-12.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_1_state\"=>{\"url\"=>\"http://sdp-decahose-dev-1.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_2_state\"=>{\"url\"=>\"http://sdp-decahose-dev-2.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_3_state\"=>{\"url\"=>\"http://sdp-decahose-dev-3.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sdp_decahose_dev_4_state\"=>{\"url\"=>\"http://sdp-decahose-dev-4.amers1.cis.trcloud:1055/xstatus\", \"codec\"=>\"line\"}, \"sanji_04_state\"=>{\"url\"=>\"http://sanji-04.int.westgroup.com:1055/xstatus\", \"codec\"=>\"line\"}, \"sanji_05_state\"=>{\"url\"=>\"http://sanji-05.int.westgroup.com:1055/xstatus\", \"codec\"=>\"line\"}, \"pavlik_11_state\"=>{\"url\"=>\"http://pavlik-11.int.westgroup.com:1055/xstatus\", \"codec\"=>\"line\"}}", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}
{:timestamp=>"2015-11-23T07:46:13.653000-0600", :message=>"config LogStash::Inputs::HTTP_Poller/@connect_timeout = 15", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}
{:timestamp=>"2015-11-23T07:46:13.655000-0600", :message=>"config LogStash::Inputs::HTTP_Poller/@request_timeout = 120", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}
{:timestamp=>"2015-11-23T07:46:13.657000-0600", :message=>"config LogStash::Inputs::HTTP_Poller/@interval = 60", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}

This file has been truncated. show original

As you can see, every other call to certain URLs fail. (the first call
succeeds and then it fails and then it succeeds and then it fails and so
on....)
The failure is indicated by the http_failure (ignore the
json_parsefailures)

let me know if you need more

Ramdev

Andrew_Cholakian1 · December 15, 2015, 9:19pm

OK! So the good news here is I found the bug!

There are three separate things going on here:

api.openweathermap.org is broken and does not properly support KeepAlive ( I suggest you file a bug).
Your config specifies keepalive as a per-url setting which is incorrect and does nothing ( it must be at the top level)
Your config specifies automatic_retries but also as a per-url setting which is also incorrect (it does nothing).

If you move those settings to the top level everything should work.

This is very confusing however, and hardly your fault for the mixup. I've opened this github issue to fix the parameter validation: https://github.com/logstash-plugins/logstash-input-http_poller/issues/40

Andrew_Cholakian1 · December 15, 2015, 9:20pm

This issue is probably similar to what @Ari_dev was seeing. I recommend you disable keepalive.

Topic		Replies	Views
HTTP Poller not working - No errors Logstash	5	1949	May 22, 2018
Http_poller is not working for me, please help Logstash	20	7639	April 26, 2018
Help with HTTP_Poller Logstash	2	887	July 6, 2017
_http_request_failure in logstash while using http_poller Logstash	6	1738	July 6, 2017
Http_request_failure when trying to get logs with http_poller plugin logstash Logstash	10	2617	July 6, 2017

Http_poller 's weird behavior with polling URLs

Related topics