REST API, http_poller and ruby code

I wonder if the following functionality, which I've successfully implemented in the form of a client-side Python script, can be completely replaced by server-side Logstash configuration, using a combination of http_poller input and ruby filter plugins.

This is what my Python script currently does:

  • Pulls a REST API every x seconds and asks for the latest record
  • If the record is more than 1 steps ahead, it runs a for-loop and downloads one-by-one all the incremental new records, ensuring that no record is missing
  • It stores the last_run, i.e. the last record downloaded so that if the script fails, it can start from that point
  • It can historically load records if I provide it with a starting record and number of records that I want it to download
  • It creates a log which I then use Filebeat to ship to elastic

The reasons why I used client-side scripting and not server-side Logstash plugins, are the following:

  • the http_poller input plugin seems to be stateless (see this post from @guyboertje) but I wonder if by using ruby to write to a file you can get around that
  • there's no easy way to do a first query to ask for the latest record id and then run a loop to retrieve the new records, avoiding the sleep_time. Again maybe in ruby, all this is possible.

On the flip side, there are some advantages if I could do this server-side inLogstash, namely:

  • A server-side script can be easily used across multiple sources without need for installing and monitoring the script and without client dependencies such as rotating logs
  • This can become the basis of a custom plug-in, further enriching and enhancing the received data in a cloud-based way.

Wonder if anyone has experiences with this dilemma and whether you've also resorted to client-side scripting or think this is achievable through some advanced understanding and implementation using Logstash functionality and its plugins.

You cannot do it using an http_poller input. No ruby filters will have executed when the input runs and there is no way to pass state to it.

You might be able to use an http filter. You could use any of the inputs that have a schedule option to create dummy events, then use a combination of ruby and http filters to do the work on that schedule.

It feels a bit like like solving the Towers of Hanoi problem in sendmail.cf. It can be done, and it is interesting to see it work, but that doesn't make it a good idea :smiley:

@Badger, thanks for the reply, I was not aware of the similar functionality of the HTTP Filter Plugin and it does make sense that this can be more easily combined with Ruby filters.

I guess I need to find some examples that use these filters in a combined way to understand and assess the tradeoff in complexity of this approach since my so far experience with ruby inside Logstash has been that it's quite hard to debug and ensure that it's working properly.

And I wonder if at that point you reach the stage of contemplating the development of a custom Logstash plugin.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.