APM index failing ILM rollover

Hi, I have my APM setup, and I thought I setit up correctly for ILM, it rolled over once from 0000001 to 0000002 but now the index won't rollover, I get the following error

illegal_argument_exception: setting [index.lifecycle.rollover_alias] for index [apm-6.7.1-span-000002] is empty or not defined

Could anyone offer any help or advice on debugging and getting my rollovers working?

Hello!

Sorry to hear. I am assuming you are using apm-server and elasticsearch 7.3, is that correct?
Can we see your apm-server.yml, index templates, and ilm policies?

Hi, yeah we're using an apm server, our elastic search version is 6.7.1. Our index template looks like this

{
  "apm-6.7.1-span" : {
    "order" : 2,
    "index_patterns" : [
      "apm-6.7.1-span-*"
    ],
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "apm-6.7.1-span",
          "rollover_alias" : "apm-6.7.1-span"
        }
      }
    },
    "mappings" : { },
    "aliases" : { }
  }
}

And this is the ilm policy

{
  "policy": "apm-6.7.1-span",
  "phase_definition": {
    "min_age": "0ms",
    "actions": {
      "rollover": {
        "max_size": "100mb",
        "max_age": "1d"
      },
      "set_priority": {
        "priority": 100
      }
    }
  },
  "version": 3,
  "modified_date_in_millis": 1567589561158
}

I've been able to clear the error that I was getting yesterday by creating a new alias;

PUT apm-6.7.1-span-000001 
{
  "aliases": {
    "apm-6.7.1-span":{
      "is_write_index": true 
    }
  }
}

So now the error message is gone. The issue that I have now is that you can see I've set the index to rollover after 100mb. It's past that threshold, and it's current action shows rollover, but it doesn't look like it's doing it, it's now up to 130mb and is still trying to do the rollover

The index has rolled over now, is there any indication of how long a rollover takes? it doesn't seem right that an index will get to double (200mb in this case) it's policy limit before the rollover completes and opens a new index

Glad that you found the issue, you needed that alias indeed!

max_size should be taken as an estimate, and not as an exact value. People often use much larger values for that setting, and in the context of many gigabytes 100 megs deviation doesn't make much of a difference.

As a matter of fact (unless you were only testing), If you want to avoid too much fragmentation you might consider bumping up that quite a bit: somewhere between 10 and 50 Gb is very reasonable.

A rollover doesn't take much time, but it kicks in every 10 minutes by default. You can change it like this if you wish:

{
  "transient": {
    "indices.lifecycle.poll_interval": "1m" 
  }
}

Hope this helps!

Thanks for your response, yeah the 100mb was just for testing the rollover, we wouldn't have such a strict size policy as that, seeing as we'll be doing 50gb rollovers in a cluster that has about 8tb of space, so that's good news. Thanks for the help