Metricbeat: "More than one write index" errors in version 7.13.x

Ever since upgrading the elastic stack from 7.12.1 to 7.13.x, we've been getting frequent hang ups with running metricbeat (or any other beat) due to the following error:

2021-06-08T22:01:05.930Z	ERROR	[index-management.ilm]	ilm/std.go:128	Index Alias metricbeat-7.13.1 setup failed: failed to create alias: {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [metricbeat-7.13.1] has more than one write index [metricbeat-7.13.1-2021.06.03-000001,metricbeat-7.13.1-2021.06.08-000001]"}],"type":"illegal_state_exception","reason":"alias [metricbeat-7.13.1] has more than one write index [metricbeat-7.13.1-2021.06.03-000001,metricbeat-7.13.1-2021.06.08-000001]"},"status":500}: 500 Internal Server Error: {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [metricbeat-7.13.1] has more than one write index [metricbeat-7.13.1-2021.06.03-000001,metricbeat-7.13.1-2021.06.08-000001]"}],"type":"illegal_state_exception","reason":"alias [metricbeat-7.13.1] has more than one write index [metricbeat-7.13.1-2021.06.03-000001,metricbeat-7.13.1-2021.06.08-000001]"},"status":500}.

This seems to occasionally occur when we restart all our metricbeat instances.
Here's the part of our metricbeat config I think is relevant to this issue:

setup.ilm.rollover_alias: metricbeat-%{[agent.version]}
setup.ilm.policy_name: metricbeat-policy
setup.ilm.policy_file: /usr/share/metricbeat/ilm-policy.json
setup.ilm.overwrite: true

And here's the corresponding ILM policy file:

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "30d",
            "max_size": "50gb"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
1 Like

When that happens, can you check the output from _cat/aliases/metricbeat*?v?

Here's the output from GET _cat/aliases/metricbeat*?v=

alias             index                               filter routing.index routing.search is_write_index
metricbeat-7.13.1 metricbeat-7.13.1-2021.06.03-000001 -      -             -              true

Thanks, but it'd need to be from when the error occurs, to see if there is indeed two indices attached to it.

The error is occurring. I'm watching the metricbeat logs and running the above aliases query as I type this message. Only one index ever shows up in the output.

I found an issue on GitHub that appears to be exactly the issue I'm having:

And here's the corresponding fix:

Is there an ETA when this will be released?

look like this have been release with v7.13.2 version?

We see also this bug at midnight sometimes, elasticsearch logs:

alias [apm-7.13.2-span] has more than one write index [apm-7.13.2-span-000013,apm-7.13.2-span-000001]

APM queue become full and do not accept any new data.