Pull data periodically from an Elasticsearch index


We have some log indices and now there is a requirement to extract some pieces of information from them and put them in another index where we can run some machine learning jobs on them.

Initally I was thinking of reindex API but then while it allows me to run a query I have not seen any examples where I can do any computation between the events.

I have to find the time taken by subtracting the timestamp value between two events. Logstash Aggregate filter looks like a good candidate.

However Logstash Elasticsearch Input does not have any tracking of events processed like jdbc input has. I want to run the job every 20 mins and that I can do using the scheduling in Elasticsearch input plugin. But I think it will read the events all over again.

Any ideas?

EDIT: There is an answer here on stackoverflow which is a bit encouraging.

What it says is:
query => [your ES query, returning everything in the last 2 minutes]
schedule => "/2 * * * *"

This will run the input collection every 2 minutes, and return everything with a timestamp in the last 2 minutes.

Does this plugin put in a timerange in the query behind the scene?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.