ILM rollover not working on max_size?

I have an ILM policy with both max_age and max_size specified. Rollover happens as expected about ten minutes after max_age, but does not seem to happen on max_size. With max_size set to 10MB I've seen shard sizes up to 40MB or so. With max_size set to 100MB I've seen the shard size up to 140MB. In all cases max_age then appeared to trigger the rollover (or I restarted the test for some other reason).

Does max_size work? What do I have to do to make it work?

Just in case I have a syntax or typing error I've failed to notice, the policy, from GET _ilm/policy, is

  "filebeat-ilm-policy" : {
    "version" : 1,
    "modified_date" : "2021-10-29T12:39:06.939Z",
    "policy" : {
      "phases" : {
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "100mb",
              "max_age" : "1d"
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : {
          "min_age" : "7d",
          "actions" : {
            "delete" : {
              "delete_searchable_snapshot" : true
            }
          }
        }
      }
    }
  }

The rollover API is called by ILM every 5 minutes (or 10 minutes I don't remember the default value).

It can happen that during this period, the size of the shard is above the threshold you set.
This behavior is expected.

You can change the ILM poll interval if needed:

PUT _cluster/settings
{
  "transient": {
    "indices.lifecycle.poll_interval": "5s"
  }
}

But I'd not do this in production as the default values seem reasonable to me.

That, I'm afraid, though the first thing I thought of, appears not to be the answer. I've seen shard size well above max_size for several tens of minutes. Watching the ilm_history index confirms that ILM does seem to be running every ten minutes (except when everything is in check_rollover_ready state, when nothing at all appears in ilm_history until something reaches max_age.)

Rollover should be typically thought of in terms of GBs not MBs so the overage in true production systems tend to be a small percentage. This does have some relationship to indexing rate.

ILM is not built as @dadoonet said to rollover on exact byte counts it is a background task.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.