Periodically run elasticsearch for new records since last run

Hi,

I've asked this before but no gotten a solution, I'm in need of a running an elasticsearch query periodically, every minute or so, and return a count that is based on new records since the last time the query was executed.

I tried the "http_poller" input plugin however it doesn't support the ability to store the last timestamp to then filter by, could someone please provide a solution I could try.

Thanks,
E

There are no standard plugins for this but it should be fairly easy to write a custom plugin that does exactly what you want.

@magnusbaeck

I see, http_poller just needs to be extended with a record_last_run. Haven't had to extend a plugin before but I will give it a shot when I get some time.

Thanks,
E

I was able to extend the http_poller input plugin to be able to store the last @timestamp returned with a query and then filter greater than @timestamp with each subsequent API call. This is working, however using @timestamp is not enough since it only has millisecond timing and it is possible to run into a scenario where additional records could be added with the same timestamp after I make the API call.

Could you advise on how to handle such a case. Perhaps using an additional field, if so what would be the easiest/best solution?

Thanks,
E

I think it'll be very hard to do this correct and without race conditions. Even if you save a high-resolution timestamp before or after you issue the ES query there will be cases where you'll drop events. Is there any way you could siphon off the new documents before they're stored in ES in the first place? For example, could the system posting into ES instead post to a message queue that you could subscribe to?

Agreed time stamp alone is not going to work. Currently I'm not using a message queue, it seems a bit much right now to add one to resolve this scenario. If one was in place already I would have pursued it though.

Is it possible to setup ES with an auto increment field on insert? If that were possible I could then query on it to find new records. If that is not possible then I need a way from logstash generate an auto increment and send to ES to later query on.

Thanks,
E

ES does not support auto-incrementing fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.