Index Rollup delay effect

#1

I previously experimented with a rollup job that ran every 30 minutes, with an interval of 1 hour, and a delay of 12 hours.

I wanted to experiment with rolling up older data. I have a new job that runs every 30 minutes, with an interval of 1 hour, and a delay of 7 days. After approx. 18 hours, I see no data has been rolled up. Do I need to wait a week for data to appear? Or am I missing something else? The previous job worked as expected.

(Zachary Tong) #2

Hey @mcguacon, apologies for the delay (on a business trip right now with minimal connectivity).

This is correct. The indexer wakes up on the cron schedule, looks at the most recent timestamp in the index and decides if it it needs to process data based on the last position and the delay value. In this case, the delay means no documents will be generated until 7 days have passed.

The indexer isn't doing any work or buffering things up in memory, it's just going back to sleep until the delay has passed.

Hope that helps!

#3

That does help, thank you!
So just to clarify, if I make a job now with a 7d delay, it wont retroactively rollup all the data in my index that's 7 days old right now?

(Zachary Tong) #4

Hey, sorry for the delay (heh), was traveling for work.

It will actually :slight_smile: So the way the indexer works is like this:

  • Indexer starts. Is there persisted checkpoint for this job? If no, start from the beginning of the index, otherwise pickup from the most recent checkpoint
  • Is there data that is older than now - delay? If yes, start rolling up that data. If no, go back to sleep

So for example, if your index has three weeks of data, the first two weeks and one day will be rolled up, then the remaining 6 days will block until another day's worth of data is ingested (assuming the index ends with a "now'ish" timestamp).

If instead you had three weeks of data, but all the data was a year old, all the data would be rolled up because the most recent timestamp in the index is older than now - delay

Hope that helps!

#5

Okay, I am understanding it correctly then. So, I am confused with my results then. This job was started about a week ago.


I have had success in the past with other rollup jobs, but other ones have ended up like this, essentially doing nothing. My anecdotal evidence seems to be if I configure the job via the wizard, it doesn't work. Just using the API in the dev tools, I believe they work fine. We are on 6.5.

(Zachary Tong) #6

Hmm. That is indeed suspicious.

Can you paste the rollup configuration that the UI generated (GET _xpack/rollup/job/<job_id>)? I'm wondering if maybe the cron is being misconfigured by the UI, and so it isn't triggering very often?

Is there anything about rollups in the server logs?

#7

result of that API call

    {
  "jobs" : [
    {
      "config" : {
        "id" : "process_rollup",
        "index_pattern" : "metricbeat-process-*",
        "rollup_index" : "process_rollup",
        "cron" : "0 0 22 * * ?",
        "groups" : {
          "date_histogram" : {
            "interval" : "60m",
            "field" : "@timestamp",
            "delay" : "3d",
            "time_zone" : "UTC"
          },
          "terms" : {
            "fields" : [
              "sentry.server",
              "system.process.name"
            ]
          }
        },
        "metrics" : [
          {
            "field" : "system.process.cpu.total.norm.pct",
            "metrics" : [
              "avg",
              "max",
              "min"
            ]
          },
          {
            "field" : "system.process.memory.size",
            "metrics" : [
              "avg",
              "min",
              "max"
            ]
          }
        ],
        "timeout" : "20s",
        "page_size" : 1000
      },
      "status" : {
        "job_state" : "started",
        "upgraded_doc_id" : true
      },
      "stats" : {
        "pages_processed" : 3,
        "documents_processed" : 0,
        "rollups_indexed" : 0,
        "trigger_count" : 3
      }
    }
  ]
}
(Zachary Tong) #8

Hmm. Config looks ok. Do you see anything in the server logs?

Sometimes the job can run into issues (incorrectly mapped field, like trying to average a string, etc) and will log the exception.

How long has this job been "running"? trigger_count: 3 means the cron has only fired three times, so if the job has been running for longer than three days there could be something wrong with the task itself.