Index not rolling over

Hi,

I have setup elasticsearch and created the following but the index will not roll over when a limit (size in this case) is reached.

Policy

{
  "policy" : {
    "phases" : {
      "hot" : {
        "min_age" : "0ms",
        "actions" : {
          "rollover" : {
            "max_size" : "50gb"
          },
          "set_priority" : {
            "priority" : 100
          }
        }
      },
      "delete" : {
        "min_age" : "90d",
        "actions" : {
          "delete" : {
            "delete_searchable_snapshot" : true
          }
        }
      }
    }
  }
}

Index

{
  "index_patterns": ["gold-*"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index": {
      "lifecycle.name": "gold-policy",
      "lifecycle.rollover_alias": "gold"
    }
  }
}

Alias

{
  "aliases": {
    "gold":{
      "is_write_index": true
    }
  }
}

Right now, the index is linked to the policy and is healthy but as you can see it has crossed the size limit and has not rolled over.

green open gold-000001 XXXXX 1 0 222151235      0 95.9gb 95.9gb

I made sure the ILM status was running

{"operation_mode":"RUNNING"}

Could someone help me get this working?

Thank you
Ed

We see the same kind of problems for version 7.10. Either it does not roll over, or it does not list newly created indices as managed, or it somehow ignores the delete action completely if it changed the phase for one existing index. We tried to put the delete action back into the policy again, as suggested for an issue found on github, but that also does not work. ILM is in the RUNNING state. The Beats create the indices again, but somehow only sometimes create the index alias with a date. Nothing works as expected.

Thanks @antonkoenig. Does anyone else have the same problem or a solution to the problem/my config? This is a serious problems with large quantities of logs.

What's the current status of the policy?

Hi @warkolm, good instincts :wink:

{
  "error" : "Incorrect HTTP method for uri [/_ilm/explain?pretty] and method [GET], allowed: [POST]",
  "status" : 405
}

I had a quick look and I saw a post about this error happening due to a trailing error in a URL but I can't find where. Would it be in my pipeline?

Thank you for your help!

What was the command you ran?

curl -X GET -H 'Content-Type: application/json' "localhost:9200/_ilm/explain?pretty"
{
  "error" : "Incorrect HTTP method for uri [/_ilm/explain?pretty] and method [GET], allowed: [POST]",
  "status" : 405
}

Per the docs you need to include the index name - GET <target>/_ilm/explain.

I have a policy named gold policy

$ curl -X GET -H 'Content-Type: application/json' "localhost:9200/_ilm/policy/gold-policy?pretty"
{
  "gold-policy" : {
    "version" : 4,
    "modified_date" : "2021-02-19T11:16:15.136Z",
    "policy" : {
      "phases" : {
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "50gb"
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "delete" : {
          "min_age" : "90d",
          "actions" : {
            "delete" : {
              "delete_searchable_snapshot" : true
            }
          }
        }
      }
    }
  }
}

but I get the following when I run

$ curl -X GET -H 'Content-Type: application/json' "localhost:9200/gold-policy/_ilm/explain?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [gold-policy]",
        "resource.type" : "index_or_alias",
        "resource.id" : "gold-policy",
        "index_uuid" : "_na_",
        "index" : "gold-policy"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [gold-policy]",
    "resource.type" : "index_or_alias",
    "resource.id" : "gold-policy",
    "index_uuid" : "_na_",
    "index" : "gold-policy"
  },
  "status" : 404
}

Try GET gold/_ilm/explain.

:blush:

$ curl -X GET -H 'Content-Type: application/json' "localhost:9200/gold/_ilm/explain?pretty"
{
  "indices" : {
    "gold-000001" : {
      "index" : "gold-000001",
      "managed" : true,
      "policy" : "gold-policy",
      "lifecycle_date_millis" : 1608617718571,
      "age" : "61.98d",
      "phase" : "hot",
      "phase_time_millis" : 1608617718821,
      "action" : "set_priority",
      "action_time_millis" : 1608617718821,
      "step" : "set_priority",
      "step_time_millis" : 1608617718821,
      "phase_execution" : {
        "policy" : "gold-policy",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "50gb"
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "version" : 4,
        "modified_date_in_millis" : 1613733375136
      }
    }
  }
}

Hi @warkolm, I had a look at my setup but I can't see anything out of place. Do you see anything odd in the way I setup the rollover and the outputs above? I could do a manual rollover but I think that will just delay the problem.

Maybe the problem is the way we set it up? Right now it runs in Kubernetes and we have an initContainer creating the policy, index and alias. Once done, we spin the actual container that then finds the setup is present and directly starts writing to the index.

The explain status looks ok here.

Here are some examples, first cleaning up, while metricbeat beats are already running and send in data, you might see the problems yourself when you try different combinations:


DELETE metricbeat-*
DELETE _ilm/policy/metricbeat_policy_custom
DELETE _ilm/policy/metricbeat

DELETE _data_stream/metricbeat-*
DELETE .ds-metricbeat-*
DELETE _index_template/metricbeat_template_custom

PUT _ilm/policy/metricbeat
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "30d",
            "max_size": "50gb"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "delete": {
        "min_age": "0d",
        "actions": {}
      }
    }
  }
}

PUT _ilm/policy/metricbeat_policy_custom
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "1m",
            "max_size": "1mb",
            "max_docs": 100
          }
        }
      },
      "warm": {
        "actions": {}
      },
      "delete": {
        "min_age": "0ms",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

PUT _index_template/metricbeat_template_custom
{
  "index_patterns": ["metricbeat-*"],                   
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "metricbeat_policy_custom",
      "index.lifecycle.rollover_alias": "metricbeat"
    },
    "aliases": {
      "metricbeat": {
        "is_write_index": true
    }
  }
  }
}

GET _ilm/policy/metricbeat_policy_custom

GET _index_template/metricbeat_template_custom

GET metricbeat-*/_ilm/explain

Hi @antonkoenig, thanks for lending a hand.

I did the test this morning. I applied your custom policy and custom template to a local stack and added an index as shown bellow

PUT metricbeat-000001/_doc/1
{ 
    "title" : "How to Ingest Into Elasticsearch Service", 
    "date" : "2019-08-15T14:12:12", 
    "description" : "This is an overview article about the various ways to ingest into Elasticsearch Service" 
}

I then ran GET metricbeat-*/_ilm/explain and got

{
  "indices" : {
    "metricbeat-000001" : {
      "index" : "metricbeat-000001",
      "managed" : true,
      "policy" : "metricbeat_policy_custom",
      "lifecycle_date_millis" : 1614150419906,
      "age" : "5.1m",
      "phase" : "hot",
      "phase_time_millis" : 1614150420008,
      "action" : "unfollow",
      "action_time_millis" : 1614150420008,
      "step" : "wait-for-follow-shard-tasks",
      "step_time_millis" : 1614150420050,
      "phase_execution" : {
        "policy" : "metricbeat_policy_custom",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "1mb",
              "max_age" : "1m",
              "max_docs" : 100
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1614150249236
      }
    }
  }
}

We can clearly see that the index does not respect the rollover max_age of 1 minute.

Your template helped in merging the alias definition in the template. :smiley:

Glad it helped you to merge the alias definition.

I think the rollover time can be configured with a scheduler. It may have 10 minutes as default. So 1 minute may be too short, but this short time span is for testing purposes.

When we test these configurations, sometimes the index does not show that it's managed or other problems show up. Sometimes the index is in the delete phase, but it does not delete. Maybe some elastic guru knows what's going on?

I tried one of the guides from the elastic docs, but that guide creates a data stream. Our example does show the cleanup DELETE statements for that case also.

If ILM does not work, we may have to use Curator or something else. But as I understand it, this is a job for ILM and it should work already. Sadly it does not work.

Hi,

The following worked for me. I set the max docs to 1000 and did have an index of 100,000 but at least it ended up doing a rollover of the index. The commands that helped me get the config working were GET gold-000011/_ilm/explain which gave me a status on the index state and POST /gold/_rollover which allowed me to see the error I was getting when trying to rollover.

My final (and pretty much uncahanged from my first post) is as follows. I hope it can help someone.

PUT _ilm/policy/gold_policy
{
  "policy" : {
    "phases" : {
      "hot" : {
        "min_age" : "0ms",
        "actions" : {
          "rollover" : {
            "max_docs": 1000
          },
          "set_priority" : {
            "priority" : 100
          }
        }
      },
      "delete" : {
        "min_age" : "120s",
        "actions" : {
          "delete" : {
            "delete_searchable_snapshot" : true
          }
        }
      }
    }
  }
}

PUT /_index_template/gold_template
{
  "index_patterns": ["gold*"],                   
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "index.lifecycle.name": "gold_policy",
      "index.lifecycle.rollover_alias": "gold"
    }
  }
}

PUT gold-000001
{
  "aliases": {
    "gold": {
      "is_write_index": true
    }
  }
}

Thank you all for your help!

1 Like