I found an issue on GitHub that appears to be exactly the issue I'm having:
opened 03:17AM - 04 Jun 21 UTC
closed 04:26AM - 08 Jun 21 UTC
bug
needs_team
When ILM is enabled, libbeat will attempt to create a write alias. It does this … by creating an index, and attempting to pointing the write alias to it:
https://github.com/elastic/beats/blob/dc576280f49baec169cd86f64a043cdc3d4fbab6/libbeat/idxmgmt/ilm/client_handler.go#L178-L208
If the _index_ already exists, Elasticsearch will return 400; libbeat then checks if the alias exists, and carries on without error if it does.
If the index does not exist (e.g. because it rolled over), but the _write alias_ exists and points to some other index, then Elasticsearch returns 500. In this case, libbeat fails index management setup with an error like (taken from apm-server):
> Index Alias apm-7.13.0-span setup failed: failed to create alias: {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [apm-7.13.0-span] has more than one write index [apm-7.13.0-span-000001,apm-7.13.0-span-000008]"}],"type":"illegal_state_exception","reason":"alias [apm-7.13.0-span] has more than one write index [apm-7.13.0-span-000001,apm-7.13.0-span-000008]"},"status":500}: 500 Internal Server Error: {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [apm-7.13.0-span] has more than one write index [apm-7.13.0-span-000001,apm-7.13.0-span-000008]"}],"type":"illegal_state_exception","reason":"alias [apm-7.13.0-span] has more than one write index [apm-7.13.0-span-000001,apm-7.13.0-span-000008]"},"status":500}.
And here's the corresponding fix:
elastic:master
← axw:ilm-createalias-exists
opened 05:13AM - 04 Jun 21 UTC
## What does this PR do?
When creating the initial index/write alias fails, d… on't check the status code, just check if the alias exists regardless of the error. The status code will be different (at least 400 and 500) depending on failure scenario.
## Why is it important?
This fixes a bug where alias creation fails when the alias exists but points to another index, and the initial index does not exist, e.g. due to ILM deletion.
## Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
~- [ ] I have made corresponding changes to the documentation~
~- [ ] I have made corresponding change to the default configuration files~
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have added an entry in `CHANGELOG.next.asciidoc` or `CHANGELOG-developer.next.asciidoc`.
## How to test this PR locally
1. Run a beat with `setup.ilm.overwrite: true`
2. In Elasticsearch, force a rollover (`POST /<alias>/_rollover`) and delete the initial index
3. Restart the beat; there should be no errors due to creating the write alias
## Related issues
Closes #26142
Is there an ETA when this will be released?