ILMs in cluster not performing rollover when conditions are met

Hello! We have an Elasticsearch cluster running 8.19.3 version (master, hot, cold and coordinating nodes) and we’re seeing some issues regarding our ILM policies. We are using data streams so we’ve configured several ILM policies to handle data rotation. In the hot phase we explicitly configure that we want to rollover a data stream when it reaches a 50gb size or it reaches 30 days old. The thing is that Elasticsearch does not seem to be doing it half of the times. We see this for all of our ILMs, I’ll leave an example below.

This is a shard that’s over 50gb:

# GET _cat/shards/.ds*?v&s=store,store:desc

...
.ds-logstash-$some-index-2025.11.10-000026                                                               2     r      STARTED     28659163   56.4gb   56.4gb $ip_address $elasticsearch_node

We have cases where shards reach over to 70gb or even 100gb and they do not get rollover, I do not have a bigger example at the moment since I had to rollover them yesterday by hand because they cause performance issues.

This are our related ILM and templates configurations for that index, nothing seems to be odd:

# GET .ds-logstash-$some-index-2025.11.10-000026/_ilm/explain

{
  "indices": {
    ".ds-logstash-$some-index-2025.11.10-000026": {
      "index": ".ds-logstash-$some-index-2025.11.10-000026",
      "managed": true,
      "policy": "7warm15cold40delete",
      "index_creation_date_millis": 1762815974348,
      "time_since_index_creation": "15.97h",
      "lifecycle_date_millis": 1762815974348,
      "age": "15.97h",
      "phase": "hot",
      "phase_time_millis": 1762815974461,
      "action": "rollover",
      "action_time_millis": 1762815975461,
      "step": "check-rollover-ready",
      "step_time_millis": 1762815975461,
      "phase_execution": {
        "policy": "7warm15cold40delete",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "30d",
              "min_docs": 1,
              "max_primary_shard_docs": 200000000,
              "max_primary_shard_size": "50gb"
            },
            "set_priority": {
              "priority": 100
            }
          }
        },
        "version": 1,
        "modified_date_in_millis": 1759800628187
      },
      "skip": false
    }
  }
}
# GET _ilm/policy/7warm15cold40delete

{
  "7warm15cold40delete": {
    "version": 1,
    "modified_date": "2025-10-07T01:30:28.187Z",
    "policy": {
      "phases": {
        "cold": {
          "min_age": "15d",
          "actions": {
            "set_priority": {
              "priority": 0
            }
          }
        },
        "warm": {
          "min_age": "7d",
          "actions": {
            "allocate": {
              "number_of_replicas": 1,
              "include": {},
              "exclude": {},
              "require": {}
            },
            "forcemerge": {
              "max_num_segments": 1
            },
            "readonly": {},
            "set_priority": {
              "priority": 50
            }
          }
        },
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "30d",
              "max_primary_shard_size": "50gb"
            },
            "set_priority": {
              "priority": 100
            }
          }
        },
        "delete": {
          "min_age": "40d",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
	...
        ".ds-logstash-$some-index-2025.11.10-000026",
        ...
      ],
      "data_streams": [
	...
        "logstash-$some-index"
      ],
      "composable_templates": [
        ...
        $index-template,
        ...
      ]
    }
  }
}

And the associated template:

{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "7warm15cold40delete"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        },
        "number_of_shards": "5",
        "number_of_replicas": "1"
      }
    },
    "mappings": {
      "_data_stream_timestamp": {
        "enabled": true
      },
      "properties": {
        "@timestamp": {
          "type": "date"
        }
      }
    },
    "aliases": {}
  }
}

I had to obfuscate some info such as index and templates names but the settings are all visible. We have several of this policies configured the same we only change the retention days in each one of them.

Please note that we’ve recently started using 8 version so I checked the docs in case I was configuring something deprecated in 7 version or something similar, but I don’t see the problem.

Is there something we’re configuring wrong or something malfunctioning? We know that sometimes shards go a little bit over 50gb and then they rollover, that’s ok. But we have shards reaching 100gb and the ILM says it’s in `"step": "check-rollover-ready"` but half of the times it does not perform the rollover even when the conditions are met. I appreciate any help you can provide us!

Hello @Natalia_Mellino

To troubleshoot this issue we must see all the shards for these index so the command you should use is :

GET _cat/shards/.ds-logstash-$some-index-2025*?v&s=store,store:desc

This index has 5 primary shards & 1 replica shard as per the index template so above command will have 10 records.
We need to know the size of primary (ideally will be same as replica) still what is the size of each primary shard for this index at time of issue?
Also as per your input can you confirm if it works sometime for this index (an example as you said it is for all indices) & sometimes it does not rollover?
Can we check the master node logs to see if there are any messages related to ILM ?

Thanks!!