Hi All,
I have an ELK cluster with 3 nodes. ELK version is 5.5.1. There were some connectivity issues between the servers and they couldn't ping each other for a minute or so. This led to log data or shards being out of sync between the nodes.
Then there was a reboot of all the 3 servers and when they all came up, one of the node kept throwing the below error:
[2017-12-19T16:39:55,783][WARN ][o.e.g.DanglingIndicesState] [sv-ocb3] [[filebeat-2017.12.07-000091/KFLrF95qTXSXvuZQ6WQLaQ]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2017-12-19T16:39:56,463][WARN ][o.e.g.DanglingIndicesState] [sv-ocb3] [[filebeat-2017.12.07-000091/KFLrF95qTXSXvuZQ6WQLaQ]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
After two days, I see an auto import being done as shown below.
Logs from node : sv-ocb3
[2017-12-21T16:00:02,616][WARN ][o.e.g.DanglingIndicesState] [sv-ocb3] [[filebeat-2017.12.07-000091/KFLrF95qTXSXvuZQ6WQLaQ]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
[2017-12-21T16:00:02,734][INFO ][o.e.g.DanglingIndicesState] [sv-ocb3] [[filebeat-2017.12.07-000091/KFLrF95qTXSXvuZQ6WQLaQ]] dangling index exists on local file system, but not in cluster metadata, auto import to cluster state
Logs from node : sv-ocb2
[2017-12-21T16:00:02,829][INFO ][o.e.c.m.MetaDataMappingService] [sv-ocb2] [filebeat-2017.12.07-000138/eh8vAvnTTseaCkiBoPV6ZA] update_mapping [log]
[2017-12-21T16:00:02,854][INFO ][o.e.g.LocalAllocateDangledIndices] [sv-ocb2] auto importing dangled indices [[filebeat-2017.12.07-000091/KFLrF95qTXSXvuZQ6WQLaQ]/OPEN] from [{sv-ocb3}{4RtFLtGzSFuchKWgtE7lPQ}{UOJrt9OoTSSqvT_cpWQJ9Q}{sv-ocb3.iwojima.com}{10.30.4.19:9400}]
[2017-12-21T16:00:03,342][INFO ][o.e.c.r.a.AllocationService] [sv-ocb2] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[filebeat-2017.12.07-000091][2]] ...]).
[2017-12-21T16:00:04,561][INFO ][o.e.c.r.a.AllocationService] [sv-ocb2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[filebeat-2017.12.07-000091][3]] ...]).
After this auto import, logstash kept throwing the below error for any requests that it got from filebeat.
[2017-12-21T16:00:07,687][ERROR][logstash.outputs.elasticsearch] Got a bad response code from server, but this code is not considered retryable. Request will be dropped {:code=>400, :response_body=>"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Alias [filebeat_logs] has more than one indices associated with it [[filebeat-2017.12.07-000091, filebeat-2017.12.07-000138]], can't execute a single index op\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Alias [filebeat_logs] has more than one indices associated with it [[filebeat-2017.12.07-000091, filebeat-2017.12.07-000138]], can't execute a single index op\"},\"status\":400}"}
So the alias ‘filebeat_logs’ which is supposed to point to the latest index also started pointing to the dangling index filebeat-2017.12.07-000091 after the auto import of the dangling index.
Is there something I can do so that I do not run into this issue again the next time any connectivity issues between the nodes occur?