[BUG] ECK 2.14/Eck-es 8.14.3: ILM Policy Rollover Settings (max_docs, max_primary_shard_docs) Incorrectly Merged with Cluster Defaults

Current Behavior

Description

We are experiencing unexpected behavior with ILM policy rollover settings in ECK 2.14 with Elasticsearch 8.14.3. Our custom ILM policy settings are being merged with cluster default rollover settings instead of overriding them.

Environment

  • ECK Operator version: 2.14
  • Elasticsearch version: 8.14.3
  • Kubernetes environment

Our ILM policy is defined as:
```json
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_primary_shard_docs": 7500000,
            "max_primary_shard_size": "5gb"
          }
        }
      }
    }
  }
}

However, when checking the ILM explain API:

GET logstash-new-default-2024.12.10-000004/_ilm/explain

We get:

{
  "indices": {
    "logstash-new-default-2024.12.10-000004": {
      "index": "logstash-new-default-2024.12.10-000004",
      "managed": true,
      "policy": "custom-log-lifecycle",
      "phase_execution": {
        "policy": "custom-log-lifecycle",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "7d",
              "max_primary_shard_docs": 200000000,  // From cluster defaults
              "max_docs": 15000000,                 // Our setting
              "min_docs": 1,                        // From cluster defaults
              "max_primary_shard_size": "5gb"       // Our setting
            }
          }
        }
      }
    }
  }
}

The cluster default settings are:

"cluster.lifecycle.default.rollover": "max_age=auto,max_primary_shard_size=50gb,min_docs=1,max_primary_shard_docs=200000000"

Steps Taken

  1. Tried deleting and recreating the ILM policy
  2. Removed the policy from indices and reapplied it
  3. Stopped and started ILM
  4. Verified no custom persistent or transient cluster settings:
GET /_cluster/settings?include_defaults=false&flat_settings=true
{
  "persistent": {},
  "transient": {}
}

Expected Behavior

The ILM policy settings should take precedence over cluster defaults for managed indices. Our policy settings should be the only ones applied to the index rollover conditions.

Questions

  1. Is this the expected behavior for ILM policies in ECK 8.14.3?
  2. If not, how can we prevent cluster defaults from being merged with our ILM policy settings?
  3. If this is expected behavior, what's the recommended way to configure custom rollover settings in ECK without them being merged with cluster defaults?

Additional Context

  • The indices are managed by ILM and have proper template configuration
  • We have both new and restored indices using this ILM policy
  • This behavior persists even after policy recreation and ILM restart
  • The issue affects all indices using this ILM policy

Impact

This merging of settings causes confusion in our rollover behavior as it's using both our configured thresholds and the cluster defaults, making it unclear when exactly rollover will occur.

Additional Solution Attempted

We tried aligning the cluster default settings with our desired ILM policy settings by updating cluster default rollover settings:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.lifecycle.default.rollover": "max_age=7d,max_primary_shard_docs=7500000,max_primary_shard_size=5gb"
  }
}

Verified the settings were accepted:

"persistent": {
  "cluster": {
    "lifecycle": {
      "default": {
        "rollover": "max_age=7d,max_primary_shard_docs=7500000,max_primary_shard_size=5gb"
      }
    }
  }
}

Followed up with:

  • Stopped ILM
  • Removed policy from index
  • Started ILM
  • Reapplied policy to index

However, ILM explain still shows the old values:

"rollover": {
  "max_age": "7d",
  "min_docs": 1,
  "max_primary_shard_docs": 200000000,  // Still showing old value despite cluster setting change
  "max_docs": 15000000,
  "max_primary_shard_size": "5gb"
}

This demonstrates that even when aligning cluster default settings with desired values, the old settings persist in the ILM execution, suggesting a deeper issue with how settings are being applied or cached by the ECK operator.

Note:

I used Elasticsearch API to set the ILM , it is very unlikely that the operator is involved. It looks like something in Elasticsearch, either a misuse or a bug.

1 Like