Latency buffer for Rollup jobs - recommended delay and/or what it depends on

alter · January 13, 2020, 11:49am

Hi!

We are using real-time rollup job to aggregare events into metrics. The scenario:

events are coming from logstash, peak ingestion flow is around 100k-120k events per minute
rollup job is scheduled to run every minute and time bucket size is 1m

We've noticed an anomaly with our scenario:

If rollup job is running with no Latency buffer count of events in aggregated data is 5-7% less then in raw events.
If rollup job is running with Latency buffer 1h - aggregated data perfectly matches raw events.

Out goal is to have aggregated metrics as real-time as possible, which ideally means to figure out Latency buffer as small as possible while making sure that no events are lost in aggregations. And this is hard to achieve without clear understanding of what:

is going on behind the cover of rollup job (in particular, how they deal with indexing delays, with events added with a past time stamp; etc)
factors influencing the ability to aggregate all events (Harware specs/Indexing rate/Indexing delay/etc)

So this is an open question - how to figure out the required latency buffer and influencing factors; what is currenlty the rollup ability to deal with indexing delays; who can share the experience of having real-time rollups; what are the recommendations for reducing required latency buffer.

Any help and knowledge share is appreciated!

Best regards,
Andrey.

alter · January 14, 2020, 2:39pm

sigh I know rollups are not exactly wide spread feature, but still - is there anyone who can share the knowledge or experience?

Thank you.

Christian_Dahlqvist · January 14, 2020, 3:52pm

The roll up query api supports querying rolled up data as well as raw data at the same time so I do not understand why delaying roll up processing would be a problem. The delay is supposed to be greater than your maximum indexing delay.

alter · January 14, 2020, 4:26pm

Thank you for response!

Current rollup implementation has certain limitations. In particular - single index (ie no daily pattern), no enrichment possibilities in process of rolling up or post rolling up, query limitations, lifecycle management limitations. Therefore we are creating our own aggregations based on rollup index, with enrichment in process. That's where the delay nuances are coming from, as well as what's behind the scenes of rollup jobs.

The delay is supposed to be greater than your maximum indexing delay.

That is a good answer to one of our questions (ie rollups vs indexing delays)! Btw, any good links as to how to calculate indexing delays? But the question whether rollups are able to aggregate the events added with a past time stamp remains open...

system · February 11, 2020, 4:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.