Shards stuck on initializing with ElasticSearch 2.1.1


(Jeffcharles) #1

We've been running a single-node cluster for development for LogStash. Today we saw the following after running curl http://localhost:9200/_cat/health?v:

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1454964515 20:48:35  elasticsearch red             1         1      0   0    0    4      138             8              45.1s                  0.0% 

Running curl http://localhost:9200/_cat/shards confirms that every index is unassigned except for today's LogStash index that has been stuck in Initializing for a few hours. There should be less than 30 megabytes of data in that index so I'd assume it should initialize relatively quickly. Memory use is at about 40% of the max heap size and storage use is at 25%.

Logs on startup look like:

elasticsearch_1 | [2016-02-08 20:34:04,242][INFO ][node                     ] [Marsha Rosenberg] version[2.1.1], pid[1], build[40e2c53/2015-12-15T13:05:55Z]
elasticsearch_1 | [2016-02-08 20:34:04,259][INFO ][node                     ] [Marsha Rosenberg] initializing ...
elasticsearch_1 | [2016-02-08 20:34:04,786][INFO ][plugins                  ] [Marsha Rosenberg] loaded [], sites []
elasticsearch_1 | [2016-02-08 20:34:05,236][INFO ][env                      ] [Marsha Rosenberg] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/xvdb)]], net usable_space [71.4gb], net total_space [98.3gb], spins? [possibly], types [ext4]
elasticsearch_1 | [2016-02-08 20:34:29,779][INFO ][node                     ] [Marsha Rosenberg] initialized
elasticsearch_1 | [2016-02-08 20:34:29,779][INFO ][node                     ] [Marsha Rosenberg] starting ...
elasticsearch_1 | [2016-02-08 20:34:30,269][WARN ][common.network           ] [Marsha Rosenberg] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.4}
elasticsearch_1 | [2016-02-08 20:34:30,269][INFO ][transport                ] [Marsha Rosenberg] publish_address {172.17.0.4:9300}, bound_addresses {[::]:9300}
elasticsearch_1 | [2016-02-08 20:34:30,351][INFO ][discovery                ] [Marsha Rosenberg] elasticsearch/HhtKQwzYQg60QAg6cfjV5A
elasticsearch_1 | [2016-02-08 20:34:33,695][INFO ][cluster.service          ] [Marsha Rosenberg] new_master {Marsha Rosenberg}{HhtKQwzYQg60QAg6cfjV5A}{172.17.0.4}{172.17.0.4:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
elasticsearch_1 | [2016-02-08 20:34:33,943][WARN ][common.network           ] [Marsha Rosenberg] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.4}
elasticsearch_1 | [2016-02-08 20:34:33,943][INFO ][http                     ] [Marsha Rosenberg] publish_address {172.17.0.4:9200}, bound_addresses {[::]:9200}
elasticsearch_1 | [2016-02-08 20:34:33,943][INFO ][node                     ] [Marsha Rosenberg] started
elasticsearch_1 | [2016-02-08 20:34:38,528][INFO ][gateway                  ] [Marsha Rosenberg] recovered [35] indices into cluster_state
elasticsearch_1 | [2016-02-08 20:35:59,339][DEBUG][cluster.service          ] [Marsha Rosenberg] processing [cluster_update_settings]: took 203ms done applying updated cluster_state (version: 4, uuid: uEqGltC0RROvDsJ4zP2wHw)
elasticsearch_1 | [2016-02-08 20:35:59,339][DEBUG][cluster.service          ] [Marsha Rosenberg] processing [reroute_after_cluster_update_settings]: execute
elasticsearch_1 | [2016-02-08 20:35:59,460][DEBUG][cluster.service          ] [Marsha Rosenberg] processing [reroute_after_cluster_update_settings]: took 120ms no change in cluster_state

Making a call to curl http://localhost:9200/_recovery?pretty yields 100% recovery on index size and files but -1.0% on the translog for all four shards. Not sure if that indicates an error occurred or not.

I've tried rebooting the node a few times but the shards stay stuck in initializing. I've also tried setting index.routing.allocation.disable_allocation to false with no change in behaviour. Is there any way to recover the node or do we have to start over with no data?


(Jeffcharles) #2

We opted to delete our data and start over. Having more than one node is a good lesson learned for the future.


(system) #3