How to scrape a page with logstash

We have an application that does not save to the database in real time, however it has a page that has some information in real time, I would like to access and scrape this information and put it inside elasticseach.
obs. this page informs you that it needs javascript and uses flashplay plugin.

Are you able to use the curl command to scrape the information from that page?
If you cannot get that data you need in a structured text format or something that would fit inside a grok filter then you will likely not be able to parse the information on the page into elasticsearch.

If something simple like cURL doesn't satisfy your needs, I would suggest:

  1. Python's 'request' library, possibly paired with something like a 'beautifulsoup' if you need to really scrape HTML.
  2. Selenium. (which is really easily to use with its Docker image), paired with a Python selenium script; could be useful if you do actually need Javascript.

However.... it should be realised that most things that "need" Javascript are real just a Javascript (or Flash) frontend that talk to a backend over some HTTP API. The 'typical' challenge here is that some of these APIs expect authentication to be handled by the browser session and so may require things like CSRF headers and such, and authentication may be a little trickier, perhaps even using something like an SSO solution such as SAML or OAUTH2. I've had success with Selenium in this regard... even if its just getting to a point where I have enough session information to then pass to Python's 'requests' API.

If you do actually need to use Flash (some products only need a Flash interface for some parts of a user-interface, such as graphs), then you may not see the traffic they make in your browsers developer tools... in which case I would recommend that you have a look at the likes of Burp Community (a website security penetration testing tool) that can Man-in-the-Middle your browser; good for observing (and manipulating) traffic between client and server, especially over SSL/TLS.

Hope that helps,
Cameron

Thank you for the informations.
I don't know if you need flash, what I do know is I want to get only the data, not even a precise graph, if I can only collect this data underneath it will be great, thank you really, you gave me a guide, the problem is that I'll have it to walk away - explaining: very weak spu in this header part and information order in html.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.