Index Lifecycle Error index.lifecycle.rollover_alias [< beat >-< version >] does not point to index issue

A while back I posted Index Lifecycle Management “does not point to index” error. I never did fully resolve my issue, and thanks to other work priorities, my dev stack I was working on died.

Now, on a new stack that's actually in our data center, the error has cropped up again.

Judging from:

Others are also running into the issue.

The bit I'm confused on, is that I am using the various beats setup command to manage ILM.

In their .yml files, the global config looks something like:

logging.files:
  keepfiles: 7
  name: < beat name >
  path: /var/log/< beat name >.log
  permissions: 420
logging.level: error
logging.to_files: true
bulk_max_size: 50
output:
  elasticsearch:
    enabled: true
    hosts:
    - eshost1:9200
    - eshost2:9200
    - eshost3:9200
    password: password
    protocol: https
    username: elastic
  logstash:
    bulk_max_size: 50
    enabled: false
    hosts:
    - logstashhost:5044
setup:
  ilm:
    enabled: true
    overwrite: true
  kibana:
    host: kibanahost:443
    password: password
    protocol: https
    username: blah_kibana_user
  template:
    settings:
      codec: best_compression
      index:
        number_of_replicas: 1
        number_of_shards: 1

For security purposes, most of my beats send their logs to Logstash. Logstash then puts them into Elasticsearch. The exceptions are my Elasticsearch nodes. The beats on those disable Logstash and push directly into Elasticsearch. Those are also where I run the beats setup command.

So, as far as I can tell, my ILM stuff should all be managed by beats. Right? As well as templates and almost all other settings.

If so, why is this error showing up?

The one thing I thought had fixed it was running the "Retry lifecycle step" option in Kibana's Index Management. That caused the error to go away, but it came back a while later.

Thanks in advance.

Any ideas?

Edit 1:

Oi, and rereading my post I see I didn't post the error... I was not fully with it that day...

The error is this:

Index lifecycle error
illegal_argument_exception: index.lifecycle.rollover_alias [filebeat-7.6.2] does not point to index [filebeat-7.6.2-2020.05.10]

That is the error on the filebeat-7.6.2-2020.05.10 index. Just replace the date and beat version with the date and beat version for the index you are looking at, and you have the same error on 364 indexes.

I'm currently on 7.8.0, so this issue is persistent across many versions.

When looking at the settings for filebeat-7.6.2-2020.05.10, index.lifecycle.rollover_alias is set to filebeat-7.6.2. That alias is pointing at filebeat-7.6.2-2020.06.29-000003 according to _cat/aliases.

From what I've read in the docs, that all seems like the way it should be. The indices should be rolled up into the filebeat-7.6.2-2020.06.29-000003 index. When certain conditions are met, a new index filebeat-7.6.2-2020.06.29-000004 (or something like that) would be created, and the filebeat-7.6.2 alias would be pointed at it. Then ILM would keep rolling all the daily indices into the rollover index.

So, what is up with that error message? I can't make any sense out of it.

Edit 2:

Would my logstash output config help cause this?

elasticsearch {
      hosts => ["blah:9200","blah2:9200","blah3:9200"]
      ilm_enabled => true
      index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    }
  }

I had it pointed at a date based index for the beats. The better way would have been to point it at the alias that my esnode beats set up via ilm. And then let logstash autodetect ilm. Right?

So, like this:

elasticsearch {
      hosts => ["blah:9200","blah2:9200","blah3:9200"]
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"
    }
  }

I'm having the same conundrum, same errors, same versions.
My confusion is that the setup is all 'out of the box' and it appears that the beats defaults provided (e.g. the index name template in the logstash pipeline yml) is not taking into account the ILM code (or vise versa).
I would very much like someone from @elastic to take a look at an out of the box experience and see what the 'next steps' are.

In this post (ILM and alias confusion) we can see the same issue is being address, albeit on a custom index, and the solution is to add the '-00001' to the index name. This is because ILM needs to see the '-00001' as part of the index name so it can increment that number. This appears to be a hard coded requirement.

Here is a request on github for allowing ILM to use custom index name (https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/869). The interesting thing about this conversation is the next post which shows the effects when a custom index name is used vs. auto-index name which includes '-00001'

I've gone ahead and updated my Metricbeat pipeline with index naming
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}-00001"

I'll watch the roll-over status tomorrow. TGIF!

OK here is what we've found through testing and reading about Index Lifecycle Management (ILM)
Our setup is Beats > Logstash > Elasticsearch
.
.

Beats create Beat-name + version specific Index Templates
https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-template.html

The setup.template section of the metricbeat.yml config file specifies the index template to use for setting mappings in Elasticsearch. If template loading is enabled (the default), Metricbeat loads the index template automatically after successfully connecting to Elasticsearch.

The Beats expect that the data they publish into Elasticsearch are placed into indexed built upon the Beat-name + version specific template. For example you wouldn't want to push Filebeat information into a Metricbeat index, not do you want to publish Metricbeat 7.6.0 data into a Metricbeat 7.8.0 index.

.
.

Beats also build Beat-name specific Index Lifecycle Policy (Yeah!)
https://www.elastic.co/guide/en/beats/metricbeat/current/ilm.html

Starting with version 7.0, Metricbeat uses index lifecycle management [ILM] by default when it connects to a cluster that supports lifecycle management. Metricbeat loads the default policy automatically and applies it to any indices created by Metricbeat.

The ILM policy has basic parameters - like roll-over after 50gb or 30 days - to keep the index from getting unwieldy.

.
.

Index Lifecycle Management - ILM policies
The Index Lifecycle policy defines how ILM handles THE Rollover. There is only ONE Rollover action and that action is to move the 'Write Alias' (aka Rollover Alias) away from the active index and make it non-active index by moving the 'Write Alias' to a new index. All subsequent actions such as moving to Warm or Cold phases or being deleted are just actions based on the rollover - they are not in themselves rollovers from one state to another. (*note: the non-active index can still be written to, but any pipelines pointing to the 'Write Alias' will only go to the active index)

ILM policies are tagged to an Index Template and are given a specific 'Write Alias' for that template name. So if you have multiple Index Templates (Metricbeat-7.6.0 & Metricbeat-7.8.0) they would each have their own 'Write Alias' that the ILM policy would be looking at. In this manner the ILM policy assumes that you have ONE and only one index assigned to the 'Write Alias' for a given Index Template.

As indicated above the ILM Rollover is the action of moving the 'Write Alias' from the active Index to a new Index based on the Index Template. It relies on the Alias name attached to the 'write' index to know what is the active index to take action on.
More on that here as written by @abdon Index Lifecycle Management "does not point to index" error

.
.

Logstash
Logstash has a default Pipeline sample for inputting Beats data into Elasticsearch
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}

This creates nice indices with Beat-name + version specific naming and adds the nicety of the date on it. However this index naming does NOT meet the naming conventions that ILM requires. ILM expects a numeric suffix that it can increment such as '-000001'. This creates a set of indices that are not ILM managed and must be dealt with manually.

To have the Pipeline write to the 'Active Index' for an index group (ILM managed indices based on an Index Template) you need to point to 'Write Alias'. You can this in your Pipeline YML

ilm_rollover_alias => "metricbeat-7.8.0"

However using this ILM rollover alias (what i've been calling the 'Write Alias') has severe limitations
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

You cannot use dynamic variable substitution when ilm_enabled is true and when using ilm_rollover_alias .

If you’re sending events to the same Elasticsearch cluster, but you’re targeting different indices you can:

  • use different Elasticsearch outputs, each one with a different value for the index parameter
  • use one Elasticsearch output and use the dynamic variable substitution for the index parameter

The article goes on to say that you shouldn't use too many Elasticsearch outputs to a given cluster due to each output needing its own resources.

.
.
-***************************-
So we had some open questions

  1. How to build pipelines that are Beat-name + Version independent but still point to the correct 'Write Alias'? We don't want to modify the Pipelines each time we bring on a new Beat name and/or Beat version.

    • What we found out here is that you can point the Pipeline to 'index => write_alias' instead of 'index => index_name'. Obviously there isn't an actual index named 'Write Alias' but to Elasticsearch inputs it all looks the same.
  2. So how do we build the initial index for a given Beat-name + Version based on the Index Template? We know that the actions around pipeline will build the index for us based on the Index Template assigned in the incoming Beats data. This means when a new Beat-name + Version build their Index Template and start sending data, we can get that new Beats data into a correct index. However if we are using a pipeline with 'ilm_rollover_alias =>' pointing to a 'write-alias' that index with the 'write-alias' has to exist first!

    • note: we found that if you allow Logstash to build an index using 'index =>' it will create the proper index but it will NOT assign the index templates ILM assigned 'Write Alias' to that new index.

Here is our new Beat onboarding work-around in our test environment. We're hoping we don't have to do this long term:

  1. Install Beat and run "[beat.exe] Setup -e". This will establish the new Beat+version specific template (and Kibana dashboards too).

  2. Assign the ILM policy and provide a 'Write Alias' (aka rollover_alias) to the new Index Template by using the Kibana ILM tools > action, assign to Index Template. We use the alias name "[Beatname]-[version]-write-alias". Note: the alias name cannot be the same as the index name.

  3. Create an index [beatname]-[version]-000001using Create Index API (curl or Kibana dev tool). Note that the index template(s) is picked up by the new Index name so be sure to match the [beatname]-[version] of the new Index Template.

  4. Add the "[Beatname]-[version]-write-alias" alias to the new index (curl or Kibana dev tool)

  5. In Logstash Pipeline YML use the new name ‘[beatname]-[version]-write-alias’

    • index => "%{[@metadata][beat]}-%{[@metadata][version]}-write-alias"
    • don't use out the ILM_Rollover_Alias and ILM_Pattern
  6. Restart Logstash

  7. Confirm documents are written to the index [beatname]-[version]-000001 via the above 'Write Alias'

  8. Confirm rollover works to alias ‘[beatname]-[version]-write-alias’. You can use the rollover APIs to force this. You can use the ?dry_run parameter to test first.

.
.

I hope that helps!

I'm guessing no replies here from Elastic since part of the scope of this changes when data streams (7.9.0+ ) are in play?