Upgrade reindex helper creates duplicates of indexes that prevent launch

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: index and alias names need to be unique, but the following duplicates were found [.watches (alias of [.watches-6/WIUCyhnGQPqZxIZ7Advo6Q])]

I get errors like the above when I upgrae the kibana and watcher indexes coming from 5.6.6 going to 6.3.1. I can delete the folder WIUCyhnGQPqZxIZ7Advo6Q in the data directory and get the node to boot once but then it replicats from the other nodes and then next node to get cycled has the same issue.

HELP!

I resolved it on Kibana by sending a curl statement to delete the kibana index and then set everything up again from scratch. this seems counter to how an upgrade helper is supposed to work though so if someone can give me some assistance please.

Hey,

in order to fully understand your issue: How did you run the upgrade?

Did you first upgrade from 5.6 to 6.3 and then run the upgrade assistant or vice versa? Did you do the upgrade when all the nodes had been on the same version and the cluster was in a stable condition?

I think your last sentence was about deleting the .watches and not the kibana index?

And the last question: Is this reproducible on every of your upgrade tries?

--Alex

Hi Alex,

was running 5.6.6

upgraded the security index with the helper first.

stopped elastic

uninstalled xpack from kibana and elastic

upgraded elastic from 5.6.6 to 6.3.1

configured certificates for SSL communication and restarted elastic

deleted the kibana index as in earlier testing I endedup having to pull it anyway as it wouldn't upgrade right.

installed kibana 6.3.1

started it up, signed in and turned on monitoring

used the curl statements
curl -XPOST -u elastic localhost:9200/_xpack/migration/upgrade/.triggered_watches
curl -XPOST -u elastic localhost:9200/_xpack/migration/upgrade/.watches

when I then restarted one of the elastic services it wouldn't start reporting the error in the origional post.

missed the last question. I'm testing thorugh on VMs and this happens each time I reset and retry. The delete of the kibana index was what I resorted to with that as we have so little data in it and I'd been asking for help in the elastic forum previously (this post was moved from there) and not got any responses so ended up deleting it out of desperation.

Thanks for your response!

do you see any exceptions in the logs when running the migration upgrade for the two watcher indices? How does the response look like?

Before you start the upgrade, is the cluster properly formed? How many nodes are in this cluster?

Also, can you test to ensure watcher is stopped before running the upgrade? And as an additional test, can you also run the upgrade of watcher before going to 6.3?

I will go review the logs quick but this was the response to the curl commnad

/_xpack/migration/upgrade/.watches
{"took":1036,"timed_out":false,"total":8,"updated":0,"created":8,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

I'm not sure how to test for the watchers being stopped but in the case of kibana which did the same thing I know it was stopped.

For both kibana and watcher indexes I did try running the index helper upgrade on 5.6.6 and that resulted in the same behavior on 5.6.6 as I get on 6.3.1 where if I navidate the file system and delete the folder for the index it will start but then it will start but then it will replicate from the other nodes and when you restart it again it will have the same issue.

Do you have any special setup in terms of deployment? Is this containerized or running in a special VM setup?

I remember a certain fix in 5.6.3 regarding this, but as you are already using a newer verison, this cant be it.

One more question: are you manualyl creating a watches index alias by chance?

Just realised I missed a few of your questions from before too, the cluster was formed and reported a status of green when asked for a health report. There are 3 nodes in the cluster and while I can't recall exactly what time I inacted the curl command for the watcher index I can't see anything that relates to that in the logs but they are littered with statments like this

 [[.watches/NpKMeiIFQdigvNfVB5R18A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata

The VMs are just to test the upgrade process ahaead of applying it to prod which is on physical tin. They are Gen 1 windows hyperV VMs using checkpoints to allow for quick reset in testing. The run Centos7 and the only real customisation is that the logs go to /logs/elasticsearch/clustername.log and data goes to /data1/elasticserach and /data2/elasticsearch which reports as an issue pre-upgrade but following a test run looked to have no impact and is documented out here Upgrade issue with path.data

In all honesty I haven't really touched anything on the watches on this test cluster. The VMs were built to let me test replacing the nodes in our cluster with new hardware and get a process on paper about a month ago and then as it was setup I've used it to test run the upgrade process too. The test data was 100k employee data sample I found online that I impoarted about 50 times to get the data footprint up. So thinking about it no watches have actually been specificlly configured, it's only what was done automatically that would have come into play.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.