We have some log indices and now there is a requirement to extract some pieces of information from them and put them in another index where we can run some machine learning jobs on them.
Initally I was thinking of reindex API but then while it allows me to run a query I have not seen any examples where I can do any computation between the events.
I have to find the time taken by subtracting the timestamp value between two events. Logstash Aggregate filter looks like a good candidate.
However Logstash Elasticsearch Input does not have any tracking of events processed like jdbc input has. I want to run the job every 20 mins and that I can do using the scheduling in Elasticsearch input plugin. But I think it will read the events all over again.
EDIT: There is an answer here on stackoverflow which is a bit encouraging.
What it says is:
query => [your ES query, returning everything in the last 2 minutes]
schedule => "/2 * * * *"
This will run the input collection every 2 minutes, and return everything with a timestamp in the last 2 minutes.
Does this plugin put in a timerange in the query behind the scene?