Rollups: What happens when docs arrive a few days late?

Paul_Ainslie · August 30, 2018, 1:15pm

We have a dozen or so servers we pull access logs off of with logstash running on each of those and they're sent to our ES cluster. I've now configured a 1 hour Rollup in our ES cluster. Over the years we've run into a few occurrences where Logstash was down for a day or two before we caught the problem at which point we started up logstash and all the old records were sent to ES.

But what happens with rollups in this scenario. If the rollup job is running every hour will it detect that older data was added to ES? Or will it simply be ignored?

polyfractal · August 30, 2018, 1:41pm

The latter option: late data is currently ignored after the time period is rolled up.

If you know there is a reasonable chance of late-arriving data, you can set the delay parameter on the rollup job. That prevent the job from rolling up a bucket of time until the delay has passed. E.g. if you set it to 24h, it will wait 24 hours after the bucket is "finished" before rolling it up, to allow late data to arrive.

The downside is of course your rollups (and aggregations/visualizations/etc) are always lagging by the delay amount. If you don't mind the delay -- or have live data to fill in the gap -- adding a day or seven to the delay would be the solution.

It's theoretically possible to go back and update the bucket when new data arrives. We'd probably need the user to invoke some sort of "re-rollup" API to target a specific time period, and then we could go remove all existing docs for that interval and re-rollup. But it sounded sufficiently complicated we decided to leave that on the wishlist for now

Paul_Ainslie · September 3, 2018, 4:50pm

Yeah, a "re-rollup" API would be amazing, but I can appreciate its complexity. Would it work for a "re-rollup" to create a newer "_version" of the given doc and then remove the doc with the older _version?

For now, I've reconciled to re-indexing my rollup in the event we're missing data. This happens a couple times a year.

Actually, if the _rollup_search could point to an alias, this would partially solve the problem for us. If we created a new version of a rollup, we would simply update the alias to point to the newer index.

system · October 1, 2018, 4:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index Rollup delay effect Elasticsearch	10	2001	June 20, 2019
Rollup job lost some data Elasticsearch	1	395	May 15, 2020
Rollup Job not working on ES 6.6.1 Elasticsearch	3	349	August 5, 2019
Logstash [7.8] problem with an elastic update doc Logstash	3	292	November 24, 2020
Rollup data in ES Elasticsearch	3	1627	July 6, 2017

Rollups: What happens when docs arrive a few days late?

Related topics