[BUG] ECK 2.14/Eck-es 8.14.3: ILM Policy Rollover Settings (max_docs, max_primary_shard_docs) Incorrectly Merged with Cluster Defaults

Hadj_Hassine_Younes · December 10, 2024, 4:06pm

Current Behavior

Description

We are experiencing unexpected behavior with ILM policy rollover settings in ECK 2.14 with Elasticsearch 8.14.3. Our custom ILM policy settings are being merged with cluster default rollover settings instead of overriding them.

Environment

ECK Operator version: 2.14
Elasticsearch version: 8.14.3
Kubernetes environment


Our ILM policy is defined as:
```json
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_primary_shard_docs": 7500000,
            "max_primary_shard_size": "5gb"
          }
        }
      }
    }
  }
}

However, when checking the ILM explain API:

GET logstash-new-default-2024.12.10-000004/_ilm/explain

We get:

{
  "indices": {
    "logstash-new-default-2024.12.10-000004": {
      "index": "logstash-new-default-2024.12.10-000004",
      "managed": true,
      "policy": "custom-log-lifecycle",
      "phase_execution": {
        "policy": "custom-log-lifecycle",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "7d",
              "max_primary_shard_docs": 200000000,  // From cluster defaults
              "max_docs": 15000000,                 // Our setting
              "min_docs": 1,                        // From cluster defaults
              "max_primary_shard_size": "5gb"       // Our setting
            }
          }
        }
      }
    }
  }
}

The cluster default settings are:

"cluster.lifecycle.default.rollover": "max_age=auto,max_primary_shard_size=50gb,min_docs=1,max_primary_shard_docs=200000000"

Steps Taken

Tried deleting and recreating the ILM policy
Removed the policy from indices and reapplied it
Stopped and started ILM
Verified no custom persistent or transient cluster settings:

GET /_cluster/settings?include_defaults=false&flat_settings=true
{
  "persistent": {},
  "transient": {}
}

Expected Behavior

The ILM policy settings should take precedence over cluster defaults for managed indices. Our policy settings should be the only ones applied to the index rollover conditions.

Questions

Is this the expected behavior for ILM policies in ECK 8.14.3?
If not, how can we prevent cluster defaults from being merged with our ILM policy settings?
If this is expected behavior, what's the recommended way to configure custom rollover settings in ECK without them being merged with cluster defaults?

Additional Context

The indices are managed by ILM and have proper template configuration
We have both new and restored indices using this ILM policy
This behavior persists even after policy recreation and ILM restart
The issue affects all indices using this ILM policy

Impact

This merging of settings causes confusion in our rollover behavior as it's using both our configured thresholds and the cluster defaults, making it unclear when exactly rollover will occur.

Additional Solution Attempted

We tried aligning the cluster default settings with our desired ILM policy settings by updating cluster default rollover settings:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.lifecycle.default.rollover": "max_age=7d,max_primary_shard_docs=7500000,max_primary_shard_size=5gb"
  }
}

Verified the settings were accepted:

"persistent": {
  "cluster": {
    "lifecycle": {
      "default": {
        "rollover": "max_age=7d,max_primary_shard_docs=7500000,max_primary_shard_size=5gb"
      }
    }
  }
}

Followed up with:

Stopped ILM
Removed policy from index
Started ILM
Reapplied policy to index

However, ILM explain still shows the old values:

"rollover": {
  "max_age": "7d",
  "min_docs": 1,
  "max_primary_shard_docs": 200000000,  // Still showing old value despite cluster setting change
  "max_docs": 15000000,
  "max_primary_shard_size": "5gb"
}

This demonstrates that even when aligning cluster default settings with desired values, the old settings persist in the ILM execution, suggesting a deeper issue with how settings are being applied or cached by the ECK operator.

Note:

I used Elasticsearch API to set the ILM , it is very unlikely that the operator is involved. It looks like something in Elasticsearch, either a misuse or a bug.

Topic		Replies	Views
Index Lifecycle Management rolleover not happening properly Kibana elastic-stack-monitoring	18	954	November 13, 2020
ILM Policy interval is later than configured Logstash ilm-index-lifecycle-management	2	224	January 4, 2023
Rollover not happening after ilm policy set Elasticsearch	1	392	November 9, 2020
ILM policies conditions are getting exceeded to rollover the indices Kibana ilm-index-lifecycle-management	5	294	September 19, 2022
Rollover ILM policy for existing Index Logstash ilm-index-lifecycle-management	6	405	March 1, 2022