Rollup Job not working on ES 6.6.1

Hi there!

Weeks ago I opened a discussion about a similar problem but then I found a workaround.

Now I can't find one (I do not want to use external script when there's a specific ES function for aggregating data).

Now, I have some data split in 24 indices per day (e.g. my_index-20190700, my_index-20190701...) each one for a specific hour of the day and each collecting something like 700k docs (3.5 Gb). Those indices have a retention period of 24 hours.

What I'm trying to do is to schedule a Rollup Job to run at 00.00 each day and collect the daily data, so data from indices my_index-*.

I created a Rollup Job like the following:

{
  "config": {
    "id": "daily_rollup_job",
    "index_pattern": "my_index-*",
    "rollup_index": "daily_index_rollup",
    "cron": "0 0 0 * * ?",
    "groups": {
      "date_histogram": {
        "interval": "24h",
        "field": "my_interesting_date_field",
        "time_zone": "UTC"
      },
      "terms": {
        "fields": [
          "my_interesting_date_field.keyword",
          "another_interesting_field1.keyword",
          "another_interesting_field2.keyword",
          "another_interesting_field3.keyword"
        ]
      }
    },
    "metrics": [],
    "timeout": "20s",
    "page_size": 10000
  }

It did run for a couple of days but then it stopped.

Last time it run was on July 1st.

What is the problem with these Rollup Jobs? Why do they stop running with no apparent reason?

Do I have to increase the Delay for such an amount of data? Which is the exact use of the Delay parameter?

Thank you in advance!

Does that "timeout":"20s" have anything to do with the problem? Maybe it stops before it processes all the data. If that's the case, how can I increase it?

But that would not explain why it worked for a couple of days with the same amount of data, nor why it didn't return even a small part of the data.

Might it be that it couldn't process all the data of July 1st, so it set a pointer to the last data it managed to process and by the time it run again (the following day) the data right after that pointer have already been deleted so it cannot properly resume the processing?

It would be very stupid because if it's always the same index pattern why should it care if the last data it processed where from 2019-07-01 at 5:00pm and when it starts again what it finds are data from 2019-07-02 at 01:00am? Makes no sense to me but it might be something to start from.

Ideas?

Apparently, increasing the timeout to 120s and the delay to 10m it seems to work properly.

It's been working for 4 days in a row by now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.