I wonder if the following functionality, which I've successfully implemented in the form of a client-side Python script, can be completely replaced by server-side Logstash configuration, using a combination of http_poller input and ruby filter plugins.
This is what my Python script currently does:
- Pulls a REST API every x seconds and asks for the latest record
- If the record is more than 1 steps ahead, it runs a for-loop and downloads one-by-one all the incremental new records, ensuring that no record is missing
- It stores the
last_run
, i.e. the last record downloaded so that if the script fails, it can start from that point - It can historically load records if I provide it with a starting record and number of records that I want it to download
- It creates a log which I then use
Filebeat
to ship to elastic
The reasons why I used client-side scripting and not server-side Logstash plugins, are the following:
- the
http_poller input plugin
seems to be stateless (see this post from @guyboertje) but I wonder if by usingruby
to write to a file you can get around that - there's no easy way to do a first query to ask for the latest record id and then run a loop to retrieve the new records, avoiding the sleep_time. Again maybe in
ruby
, all this is possible.
On the flip side, there are some advantages if I could do this server-side inLogstash
, namely:
- A server-side script can be easily used across multiple sources without need for installing and monitoring the script and without client dependencies such as rotating logs
- This can become the basis of a custom plug-in, further enriching and enhancing the received data in a cloud-based way.
Wonder if anyone has experiences with this dilemma and whether you've also resorted to client-side scripting or think this is achievable through some advanced understanding and implementation using Logstash functionality and its plugins.