Hi,
[Using Elasticsearch 1.5.1]
I currently have an issue where one of 5 primary shards in an index is stuck in INITIALIZING
state (for well over 24 hrs now). The primary shard is marked as STARTED
but I cannot retrieve stats for that shard.
Output of cat health
:
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks
1442473639 07:07:19 BBB yellow 3 3 63 30 0 2 0 0
Output of cat shards
:
index shard prirep state docs store ip node
AAA_1 1 p STARTED _____________ es-live-1
AAA_1 1 r INITIALIZING _____________ es-live-3
AAA_1 1 r INITIALIZING _____________ es-live-2
Output of another index which is fine - where I can see the shard stats
index shard prirep state docs store ip node
graphflow_1 4 p STARTED 5071499 1.9gb _____________ es-live-1
graphflow_1 4 r STARTED 5071499 1.9gb _____________ es-live-2
graphflow_1 0 p STARTED 4620643 1.6gb _____________ es-live-1
graphflow_1 0 r STARTED 4620643 1.6gb _____________ es-live-2
...
I also get this:
[2015-09-17 07:17:53,082][DEBUG][action.admin.cluster.stats] [es-live-1] failed to execute on node [-gLPPrH_R4i5RFKYoeXO3w]
org.elasticsearch.index.engine.EngineClosedException: [AAA_1][1] CurrentState[CLOSED]
Originally I was getting a lot of timeouts and some GC errors on the node that held the PRIMARY of the relevant shard. The node was unresponsive and I had to restart it. Since then the cluster has been yellow with this issue.
Search & aggregations seem to be working. But when I try to run a scan-scroll (using elasticsearch-hadoop for bulk analytics jobs), I get
SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed]
Any help appreciated.