We have a 4 node cluster using elasticsearch-6.5.2 and recently we did a synonyms update which caused the cluster state to turn red for two indexes out of four. The initial error we discovered was due to the shard failing to allocate:
"Failed shard on node [x] : failed to create index, failure illegaArgumentException[Failed to build synonyms]; nested NotSerializableExceptionWrapper[parse_exception: Invalid synonyms rule at line 2; nest IllegalArgumentException[term: termination of pregnancy anazlyed to a token (pregnancy) with position increment != 1 (got: 2)]; the allocate_explanation was "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy"
We understand why the above error occurred because the analysis chain has the stopwords filter first and then at the end has the synonyms filter and we had agreed a procedure whereby no stopwords were to be entered into the synonyms file, however accidentally a stopword was entered.
What we don't understand is why the synonyms update caused the state to turn to red and not recover. Since this happend on the live instance we quickly rebuilt the indexes and deleted the old ones when were realised they weren't recoverable. Looking at the logs there is some interesting information about
failed to list shard for shard_store on node:
please see: https://gist.github.com/imranazad/7436c43bb7ca87a1ce1f64b988d22a83
Just to add context when we update the synonyms file we restart the elasticsearch service via the command line .
So what caused the unrecoverability of the indexes? I'm not convinced it was the direct result of the synonyms update although yes that would have stopped the shard allocation but it shouldn't have made the indexes unrecoverable.