Hi there,
This may be a quite long story but I think the strange of the issue
deserves the post
I'm running a ES cluster in production (6 nodes, 1 index, 20 shards, 1
replica, 50GB of docs), and needed to add a new analyzer and convert
a field of an existing type to a multi_field data type, in order to
define the new analyzer in one of the core types of that field.
I successfully tested all necessary steps on my local machine first
(same index structure), and created scripts for each one of them, in
order to avoid typing mistakes.
I knew that, before adding a new analyzer I should close the Index
first. So, after shutting down all running indexing and searching
processes, I executed the following command from a non-ES server (a
hub for prod servers)
curl -XPOST "http://{SERVER}:9200/items/_close"
... a no answer was received at all. The 'curl' hung indefinitely
waiting for a response.
Trying to check if the Index was still opened, I ran a cluster_health
command and got that all shard numbers were 0 (zero) as if the Index
was closed. The 'close' request was still hung.
Things were getting even weirder when I ran a Delete query I use for
purging old docs and got results, but with 5 failed shards. Even more,
I ran then a search query and got results with all shards OK.
So at this point I decided to kill the 'close' curl request and start
shutting down nodes in order to see the cluster response. Executing a
simple 'kill' command didn't do the work on some nodes, so I used
'kill -9'. After killing one node and checking the health, I got a red
cluster status. Cluster info and different cluster_health outputs are
gisted here https://gist.github.com/1602222
The cluster is configured to work with at least 4 servers, so I shut
down 3 of the 6 severs and started just 2 again. Only at that point
the cluster started to recover properly.
IMPORTANT: I tried to do this task a week ago, with the very same
results. I finished restarting all nodes one by one until status got
green, but not being able to add the new analyzer. So this is not an
occasional issue I guess.
Finally! when the status was green, I tried to add the analyzer again
and everything was happily ok: closing the index, adding the analyzer,
opening the index and setting the new field mapping.
No idea what could be the reason for this, but I found this post that
may refer to a similar issue, as it mentions some DELETE executions
and we run a job every night that executes a Delete_By_Query request
which deletes hundred or thousands of records:
http://groups.google.com/group/elasticsearch/browse_frm/thread/5b83f8ad5d02fb84/49c7ab8c080c4241?lnk=gst&q=hang#49c7ab8c080c4241
Hope this help to find the root of the issue.
Thanks for your patience if you've read till this point
Frederic