ETL with logstash

I am trying to implement an ETL using Logstash. The ETL is between two instances of ES DB, from a remote server to a localhost. I want to get the max of a field "endTime" from the documents in my localhost. I want to then store this value in an environment variable and use it in the ETL pipeline to filter events from the remote server for which the value of the "endTime" field is greater than this stored value of the environment variable.

I have set the pipeline for this ETL and the conf file looks like so:

input {
elasticsearch {
hosts => ["remotehost"]
index => "source_index"
query => '{"query": {"bool": {"filter": [{"range": {"end": {"gt": "${my_var}"}}}]}}}'
output {
user => user
password => pass
hosts => ["localhost:9200"]
index => "destination_index"

Here my_var is the environment variable. I would like to have a cascaded execution where the first pipeline computes the max of the endTime field from the localhost, stores it in the environment variable and then triggers the execution of the main pipeline(described in the conf file above).

Is there a way to achieve this ? Is there a better way to do this than how I thought of? Thanks for the thoughts in advance.


I don't think you need two pipelines. The first pipeline would just make a single ES query to find the max value, and that's something you can do with curl (using jq to filter out the boring parts of the JSON response).

So basically you'd end up with something like this (preferably with better error handling, but you get the idea):

export my_var="$(curl ... | jq ...)"
logstash -f conf-file-you-just-posted.conf
1 Like

Thanks :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.