Elasticsearch ilm rollover NOT applied as it should on datastreams v8.5.2

Dear All, I'm currently using elasticsearch and kibana in 8.5.2 version; I used the stack as a centralized logging platform. Everything works fine, logstash is able to send me tousant of logs through elastic data stream mechanism.

Everything is setup according to the documentation, BUT I don't understand why my rollover (currently set to a value of 30Gb for max_primary_shard_size) is not applied until the index reach 120 Gb :confused: ; If I force the rollover manually, the index rotation is instant.

Is anybody struggle with ilm and data stream and encounters the same issue with the rollover as well ?

Please find below my dev console output with the following commands:

GET _cat/indices/.ds-logs*eis*/?v&h=index,store.size,docs.count
GET _data_stream/logs-lu-ec-app_eis-default
GET _data_stream/logs-lu-ec-app_eis-default/_stats
GET /_ilm/status
GET .ds-logs-lu-ec-app_eis-default-2023.02.23-000001/_ilm/explain
GET .ds-logs-lu-ec-app_eis-default-2023.02.25-000002/_ilm/explain
GET .ds-logs-lu-ec-app_eis-default-2023.02.27-000003/_ilm/explain

GET _ilm/policy/logs

These commands output, show us that ilm policy is well linked to the datastream, and that the size of each index (1 shard, 1 replica shard) reached 120Gb for the 1st index. the 2nd reach 80+ and I used _rollover api to force the index rotation.

The behaviour I expect is the index to rotation arround 30gb, I don't care about a couple of gb more but x4 is not what I wanted. Any ideas ?

Thanks in advance for your time and help.

kr,

nb: full dev console output:

# GET _cat/indices/.ds-logs*eis*/?v&h=index,store.size,docs.count 200 OK
index                                            store.size docs.count
.ds-logs-lu-ec-app_eis-default-2023.02.23-000001    120.6gb  473374006
.ds-logs-lu-ec-app_eis-default-2023.02.25-000002     88.3gb  345972631
.ds-logs-lu-ec-app_eis-default-2023.02.27-000003      3.1gb    9883481

# GET _data_stream/logs-lu-ec-app_eis-default 200 OK
{
  "data_streams": [
    {
      "name": "logs-lu-ec-app_eis-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-logs-lu-ec-app_eis-default-2023.02.23-000001",
          "index_uuid": "uf08u4rkTyK2LSievjdMuQ"
        },
        {
          "index_name": ".ds-logs-lu-ec-app_eis-default-2023.02.25-000002",
          "index_uuid": "u1BIUrGwRparNPhD6alSFA"
        },
        {
          "index_name": ".ds-logs-lu-ec-app_eis-default-2023.02.27-000003",
          "index_uuid": "FLY2npvtQUOGAQcJ5tKwcQ"
        }
      ],
      "generation": 3,
      "_meta": {
        "owner": "xxxxxxxxxxxxxxxxxxxxxxxxx",
        "description": "Template used with .ds-logs-lu-ec-* data_streams"
      },
      "status": "GREEN",
      "template": "index_template_elkg_migration",
      "ilm_policy": "logs",
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false
    }
  ]
}
# GET _data_stream/logs-lu-ec-app_eis-default/_stats 200 OK
{
  "_shards": {
    "total": 12,
    "successful": 12,
    "failed": 0
  },
  "data_stream_count": 1,
  "backing_indices": 3,
  "total_store_size_bytes": 227775423955,
  "data_streams": [
    {
      "data_stream": "logs-lu-ec-app_eis-default",
      "backing_indices": 3,
      "store_size_bytes": 227775423955,
      "maximum_timestamp": 1677511641612
    }
  ]
}
# GET /_ilm/status 200 OK
{
  "operation_mode": "RUNNING"
}
# GET .ds-logs-lu-ec-app_eis-default-2023.02.23-000001/_ilm/explain 200 OK
{
  "indices": {
    ".ds-logs-lu-ec-app_eis-default-2023.02.23-000001": {
      "index": ".ds-logs-lu-ec-app_eis-default-2023.02.23-000001",
      "managed": true,
      "policy": "logs",
      "index_creation_date_millis": 1677155907081,
      "time_since_index_creation": "4.11d",
      "lifecycle_date_millis": 1677358492906,
      "age": "1.77d",
      "phase": "hot",
      "phase_time_millis": 1677155909555,
      "action": "complete",
      "action_time_millis": 1677358493707,
      "step": "complete",
      "step_time_millis": 1677358493707,
      "phase_execution": {
        "policy": "logs",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_primary_shard_size": "30gb",
              "max_age": "30d"
            }
          }
        },
        "version": 5,
        "modified_date_in_millis": 1677072035871
      }
    }
  }
}
# GET .ds-logs-lu-ec-app_eis-default-2023.02.25-000002/_ilm/explain 200 OK
{
  "indices": {
    ".ds-logs-lu-ec-app_eis-default-2023.02.25-000002": {
      "index": ".ds-logs-lu-ec-app_eis-default-2023.02.25-000002",
      "managed": true,
      "policy": "logs",
      "index_creation_date_millis": 1677358493003,
      "time_since_index_creation": "1.77d",
      "lifecycle_date_millis": 1677506165140,
      "age": "1.53h",
      "phase": "hot",
      "phase_time_millis": 1677358493307,
      "action": "complete",
      "action_time_millis": 1677506177350,
      "step": "complete",
      "step_time_millis": 1677506177350,
      "phase_execution": {
        "policy": "logs",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_primary_shard_size": "30gb",
              "max_age": "30d"
            }
          }
        },
        "version": 5,
        "modified_date_in_millis": 1677072035871
      }
    }
  }
}
# GET .ds-logs-lu-ec-app_eis-default-2023.02.27-000003/_ilm/explain 200 OK
{
  "indices": {
    ".ds-logs-lu-ec-app_eis-default-2023.02.27-000003": {
      "index": ".ds-logs-lu-ec-app_eis-default-2023.02.27-000003",
      "managed": true,
      "policy": "logs",
      "index_creation_date_millis": 1677506165303,
      "time_since_index_creation": "1.53h",
      "lifecycle_date_millis": 1677506165303,
      "age": "1.53h",
      "phase": "hot",
      "phase_time_millis": 1677506165740,
      "action": "rollover",
      "action_time_millis": 1677506165940,
      "step": "check-rollover-ready",
      "step_time_millis": 1677506165940,
      "phase_execution": {
        "policy": "logs",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_primary_shard_size": "30gb",
              "max_age": "30d"
            }
          }
        },
        "version": 5,
        "modified_date_in_millis": 1677072035871
      }
    }
  }
}
# GET _ilm/policy/logs 200 OK
{
  "logs": {
    "version": 5,
    "modified_date": "2023-02-22T13:20:35.871Z",
    "policy": {
      "phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_primary_shard_size": "30gb",
              "max_age": "30d"
            }
          }
        },
        "delete": {
          "min_age": "60d",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            },
            "wait_for_snapshot": {
              "policy": "daily_all.json"
            }
          }
        }
      },
      "_meta": {
        "description": "default policy for the logs index template installed by x-pack",
        "managed": true
      }
    },
    "in_use_by": {
      "indices": [
        ".ds-logs-lu-ec-tel_expressway-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_network_sda-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_network_aci-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_pulse-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_network_legacy-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sys_netapp-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_cisco_ise-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sys_vcsa6-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sys_printing-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_netscaler-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_netskope-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sec_testnetskope-default-2023.02.23-000001",
        ".ds-logs-lu-ec-app_lovion-default-2023.02.23-000001",
        ".ds-logs-lu-ec-sys_adaxes-default-2023.02.23-000001",
        ".ds-logs-lu-ec-app_eis-default-2023.02.27-000003",
  . . .

Are you sure your indices have only 1 primary shard and 1 replica?

The following response says that you have 3 backing indices and 12 shards.

GET _data_stream/logs-lu-ec-app_eis-default/_stats 200 OK
{
  "_shards": {
    "total": 12,
    "successful": 12,
    "failed": 0
  },
  "data_stream_count": 1,
  "backing_indices": 3,
  "total_store_size_bytes": 227775423955,
  "data_streams": [
    {
      "data_stream": "logs-lu-ec-app_eis-default",
      "backing_indices": 3,
      "store_size_bytes": 227775423955,
      "maximum_timestamp": 1677511641612
    }
  ]
}

What is the result of the following request:

GET _cat/indices/.ds-logs-lu-ec-app_eis*?v

Hello, Thanks for your quick answer; You're right, I have indeed, 2 primary and 1 replica shards; but still I do not catch why the ilm rollover is not apply ?! :confused:

# GET _cat/indices/.ds-logs-lu-ec-app_eis*?v 200 OK
health status index                                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .ds-logs-lu-ec-app_eis-default-2023.02.23-000001 uf08u4rkTyK2LSievjdMuQ   2   1  473374006            0    120.6gb         60.2gb
green  open   .ds-logs-lu-ec-app_eis-default-2023.02.27-000003 FLY2npvtQUOGAQcJ5tKwcQ   2   1  143650523            0     39.4gb         19.7gb
green  open   .ds-logs-lu-ec-app_eis-default-2023.02.25-000002 u1BIUrGwRparNPhD6alSFA   2   1  345972631            0     88.3gb           44gb

# GET .ds-logs-lu-ec-app_eis-default-2023.02.23-000001/_settings 200 OK
{
  ".ds-logs-lu-ec-app_eis-default-2023.02.23-000001": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "logs",
          "indexing_complete": "true"
        },
        "codec": "best_compression",
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
        "mapping": {
          "ignore_malformed": "true"
        },
        "hidden": "true",
        "number_of_shards": "2",
        "provided_name": ".ds-logs-lu-ec-app_eis-default-2023.02.23-000001",
        "creation_date": "1677155907081",
        "number_of_replicas": "1",
        "uuid": "uf08u4rkTyK2LSievjdMuQ",
        "version": {
          "created": "8050299"
        }
      }
    }
  }
}

So you have a total of 4 shards, 2 primary and 2 replicas = 4 shards.

4 x 30gb = 120GB so working as defined.

Typically we speak of primary shards when talking about ILM

so 2 primary shards at 30 GB each = 60GB pri.store.size

Which is what your data shows.

When _cat/indices shows 2 primary and 1 replica ... That means for each of the primaries there is 1 replica... So there is a total of 4 shards... 2 primary + 2 replicas

1 Like

Hello Stephen, Thanks a lot for the clarification on how the size is in fact displayed, my bad I didn't see that by myself.

Therefore no need to adapt or change anything is the ILM, I just have to keep in mind that the total amount of 120Gb displayed in the kibana interface is in fact divided in four separated shards.

For me this topic can be closed.

kr,

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.