I am wondering if Heartbeat allows us to take the HTTP response and scrape it for values or add a script that could do post-processing of the capture. I have a situation of needing to monitor pages beyond a login and capture values from the page to validate its uptime. In the future the customer will have simple json checks on api but this is for their legacy platform. Currently using an armada of python scrape scripts in cron to achieve the same thing. Thanks in advance for your help and advice.
heartbeat is basically ping (check for service availability). Scrapping and potentially getting more metrics from a service sounds like a job for metricbeat http module (available in nightly builds). It's supporting JSON only.
Alternatively to beats you can use logstash with http_poller to do some custom processing/scrapping.
Thanks for your ideas @steffens. In my case I would be authenticating on page 1 (entry point) which takes me to page 2 (actual page with values in the DOM versus necessarily json). Can you think of any way to capture and process the response? Wondering if this is the case of building a custom beat off of metricbeat or otherwise. Happy to take it into the Metricbeat tag for more input. I love the scheduling of http_poller and metricbeat but simply need to take it a bit farther for capture.
phew, this is somewhat more complicated. For systems like beats/logstash you'd basically need 'recursive inputs' to create follow-up tasks (heartbeat is internally architected this way, in metricbeat/logstash one could just do another HTTP request before reporting an event).
Besides the scrapping (which sounds like the easy part e.g. via regex,json/xml parser, soup parser, remove all tags + regex, ...), my problem is basically defining/configuring the follow-up tasks based on the results of page 1
in a generic and reusable way. e.g. is some token passed in some json/xml/cookie/http header for use with page 2? When stepping through the pages, which parameters must be passed/stored/checked between every single step.
The login-page thingy makes me wonder: why you need to have a login page in front of your monitoring? What's wrong with HTTP authentication?
lol....excellent question on the login and auth. The new versions of sites will have a json return on the auth but many legacy sites still have to be "crawled" or checked for data. This is multi-tenant so as legacy sites age out I can forget that headache but must deal with for now. I can still fire existing python scripts that do the work if callout is possible in an output "external command" and retrieve result or ingest resulting log output.
I am used to message/mq frameworks that can create a new task based on condition and be recursive. Simply learning what beats/logstash can do. Goal is to keep it inclusive in the framework versus creating external components in the workflow. Digging into X-Pack I think Watcher/Alerter may give me some post-condition abilities as well but if I can solve the initial site checks this will be an amazing solution. Thanks for you help and patience.
Some brainstorming how this feature might look like in beats.
Thanks. I added a comment to it and would be happy to help or test on that one. It will be a common use case I think. Will be talking to one of your territory managers tomorrow about our engagement with X-Pack and will highlight this. I appreciate your depth on chasing it. Much appreciated.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.