Node won't start up due to dupliate alias/index names

I have just added a white list to one of the nodes so I could pull over some data for a feature we have been working on in developement and want to run on the main cluster now but keep all the historic data we gathered from development.

I configured the elasticsearch.yml file and then ran systemctl restart elasticsearch.service to cycle the service on the node in question which is running Centos 7 and Elasticsearch 6.6.1

The node however failed to start. When I looked in the logs I found the following

java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_2/15QKiqziRlGHM6AyCS0WjA])]

When I checked the cluster health it still reported as green

When I checked _cat/nodes it still thought the node that restarted was an active member although had no load stats for it, since starting writing this it has now seen that the node is not there and the cluster shows yellow.

I tried to restart it again thinking maybe it didn't start before as it saw that member as already joined but I got the same error in the logs.

We recently upgraded from 6.3.1 to 6.6.1 incase that has a bearing.

When looking at the Kibana indexes we have there is no index of ".kibana" but we do have "kibana_2" and ".kibana_1" on checking the aliases for those there are none on ".kibana_1" but ".kibana_2" holds the alias of ".kibana"

My only thought is that I should reindex the .kibana_2 index into a holding index such as "backup_kibana", it holds 24 documents at the moment a combination of index patterns and visualisations, I then delete the ".kibana_2" index and restart the servicewith the index now gone. if all this does it make it moan about some other index then those I'm in less of a position to reindex about.

Could you share the stack trace under this? The message is saying that there is an index called .kibana on this node, but if there's no such index in the cluster then the stack trace will give us more of a clue where it's coming from.

Hi @DavidTurner

[2019-03-29T13:27:23,039][ERROR][o.e.g.GatewayMetaState   ] [hostname] failed to read local state, exiting...
java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_2/15QKiqziRlGHM6AyCS0WjA])]
	at org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1118) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:73) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:88) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:497) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:265) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) [elasticsearch-6.6.1.jar:6.6.1]
[2019-03-29T13:27:23,047][ERROR][o.e.b.Bootstrap          ] [hostname] Exception
java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_2/15QKiqziRlGHM6AyCS0WjA])]
	at org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1118) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:73) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:88) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:497) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:265) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) [elasticsearch-6.6.1.jar:6.6.1]
[2019-03-29T13:27:23,050][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [hostname] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_2/15QKiqziRlGHM6AyCS0WjA])]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.6.1.jar:6.6.1]
Caused by: java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_2/15QKiqziRlGHM6AyCS0WjA])]
	at org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1118) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:73) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:88) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:497) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.<init>(Node.java:265) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.6.1.jar:6.6.1]
	... 6 more
[2019-03-29T13:27:23,054][INFO ][o.e.x.m.p.NativeController] [hostname] Native controller process has stopped - no new native processes can be started

that's the full stack trace for the error

Also while there is no index called .kibana, as mentioned there is an alias of .kibana on the index .kibana_2 so I'm guessing that alias is why it's having the issue and I imagine that index held a replica on the node that was cycled. So my guess is that for some reason it's seeing the .kibana_2 that is active and the .kibana_2 replica that is trying to re-integrate as different rather than a current and stale copy of the same thing.

Hmm, ok, this is strange. At some point in the past has this node changed role, e.g. by changing node.data: true to node.data: false?

Since the cluster health reports as green I think the simplest thing to do is wipe the data folder on this node and start it afresh. Be double-sure that the cluster really is green first (i.e. that you're not looking at a different cluster's health).

No, it's really complaining about an index called .kibana clashing with an alias called .kibana.

no the node.data value has never been changed

Also it did show as greeninitally as it seemed to think that the node was still a member of the cluster, however after about 10-15 mins it realised the cluster was only running two nodes and changed status to yellow which it's running currently and replicating the data between the two existing nodes.

If I run the command GET .kibana\_alias I get the response

{
  ".kibana_2" : {
    "aliases" : {
      ".kibana" : { }
    }
  }
}

Ok, I would suggest waiting for the health to be green again before wiping the problematic node, just in case.

or are you saying it's the node being cycled that thinks it has the index of .kibana on it? in which case would removing the alias from '.kibana_2' be a solution?

prior to the upgrade from 6.3.1 to 6.6.1 there was a .kibana index it was only after it that it split it to .kibana_1 & .kibnana_2 and removed the .kibana index

My concern with that is that due to the size of the indexes it will take some time (days rather than hours) before it goes green again and while I think it does have just enough space on the 2 existing nodes to manage that it it would be right at the limit

if it is just the kibana index that is the issue then would stopping kibana, removnig both .kibana_# indices from the active cluster allow me to restart the node and then start back up kibana and have it create a new .kibana index?

If it's just index patterns and visualisations that I'd need to re-create I can live with that. I'm guessing index templates live elsewhere in elastic as if those needed to be re-created that would be a little more serious for me.

Unfortunately it's not so simple. The problem that node is facing is local to that node - it's failing before the point where it would find out that you've deleted the .kibana_2 index from the rest of the cluster.

I must say I am still puzzled as to how this happened. Could you search your logs for the words Dangling or Dangled and share any matching messages?

Waiting for green is technically optional. Your cluster is yellow so it still has at least one copy of every shard. This means you could in theory start the troublesome node afresh right now. It depends how much you trust your disks: the remaining copy of a shard might have been silently corrupted and you won't find that out until it fails to recover. These things do happen.

checked todays and yesterdays log file no references to either Dangled or Dangling

So you're saying wipe all the data from the node and start it back up?

Me, I'd wait for green.

but regardless of if I wait or do it now I just delete all the content in my data directories and then restart the service?

Is there a way of identifying the .kibana and .kibana_2 index on the node without starting it to be able to delete just that and remove the issue that way maybe?

Yes that's right. Just the data directories for this one node, of course.

Unfortunately there's no safe way to do this kind of surgery on the data directory by hand.

1 Like

I have a VM cluster I use to test upgrade scripts and the likes. I decided to test this on that first as it's not something I've done before, I stopped one node, signed on and under /data1/elasticsearch and /data2/elasticsearch (I have two drives set up dedicated for data on the VMs) I delted the nodes directories. I then restarted the node and it joined back into the cluster but none of the data has migrated back to it.

Is there something else I should have done to flush the data out of the box?

Is this cluster in yellow or green health? If yellow, does the allocation explain API shed any light on why it's not assigning anything to the wiped node? If green you might still get some useful information out of the allocation explain API if you try and ask it about a specific shard; if not, try manually migrating a shard and see whether that works.

One possibility is that delayed allocation is temporarily preventing any shard movement, if you restarted the node quickly enough. Another possibility is that allocation or rebalancing are disabled.

Just to follow up, deleted the contents of the data directory under the nodes folder and restarted it and the cluster is now reballancing.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.