Shard stuck in INITIALIZING

TeePee · September 30, 2016, 9:27am

Hi guys,

I'm having an issue that I tried multiple approachs and I'm unable to find a proper solution.

I thought it was something with unassigned shards:

curl -XGET 'http://183.*.*.200:9200/_cluster/health?pretty&level=indices'

  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 325,
  "active_shards" : 325,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 326,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 49.84662576687116,

...

So, I've setted the index.number_of_replicas: 0 and it solved.
Well, sort of it. Then I've found another issue:

  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 325,
  "active_shards" : 325,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 99.69325153374233,

When I look at the shards theres seems to be one stuck in INITIALIZING:

curl -XGET http://183.*.*.200:9200/_cat/shards 

tracking-2016.09.28 3 p INITIALIZING                 185.31.158.200 tracking 
tracking-2016.09.28 4 p STARTED      1575741 360.4mb 185.31.158.200 tracking 
tracking-2016.09.28 1 p STARTED      1577167 240.3mb 185.31.158.200 tracking 
tracking-2016.09.28 2 p STARTED      1575764   239mb 185.31.158.200 tracking 
tracking-2016.09.28 0 p STARTED                      185.31.158.200 tracking 


curl -XGET 'http://183.*.*.200:9200/_cat/recovery?v'

index               shard time     type  stage    source_host    target_host    repository snapshot files files_percent bytes bytes_percent total_files total_bytes translog translog_percent total_translog 
tracking-2016.09.28 0     19       store done     185.31.158.200 185.31.158.200 n/a        n/a      0     0.0%          0     0.0%          0           0           0        100.0%           0              
tracking-2016.09.28 1     20       store done     185.31.158.200 185.31.158.200 n/a        n/a      0     0.0%          0     0.0%          0           0           0        100.0%           0              
tracking-2016.09.28 2     359751   store done     185.31.158.200 185.31.158.200 n/a        n/a      0     100.0%        0     100.0%        121         247805835   2978     100.0%           2978           
tracking-2016.09.28 3     89734989 store translog 185.31.158.200 185.31.158.200 n/a        n/a      0     100.0%        0     100.0%        109         259401405   0        -1.0%            -1

I've taken a look at /storage/tracking/data/elasticsearch/nodes/0/indices/tracking-2016.09.28/3/translog and found a few translog files there:

-rw-r--r-- 1 nobody 4294967294      43 Sep 28 16:21 translog-10.tlog
-rw-r--r-- 1 nobody 4294967294      20 Sep 28 16:19 translog-8.ckp
-rw-r--r-- 1 nobody 4294967294 4514316 Sep 28 16:17 translog-8.tlog
-rw-r--r-- 1 nobody 4294967294      20 Sep 28 16:21 translog-9.ckp
-rw-r--r-- 1 nobody 4294967294    4124 Sep 28 16:20 translog-9.tlog
-rw-r--r-- 1 nobody 4294967294      20 Sep 29 18:43 translog.ckp

I've read here that If I rename translog-9.ckp to translog.ckp it may solve the stuck state but nothing changed after restarting elasticsearch service.

Elasticsearch version: 2.3.4

Is there anyone able to guide me in the right direction?

TIA

warkolm · October 2, 2016, 1:46am

What do your logs show?

TeePee · October 4, 2016, 11:13am

Well, in logstash.log, not logstash.err, I've found a warning related to the problematic index and shard:

"create" => {
    "_index" => "tracking-2016.09.28", "_type" => "snapshots", "_id" => "AVdxYTvWa6yUWy4kpHrw", "status" => 404, "error" => {
        "type" => "engine_closed_exception", "reason" => "CurrentState[CLOSED] Closed", "shard" => "3", "index" => "tracking-2016.09.28", "caused_by" => {
            "type" => "out_of_memory_error", "reason" => "Java heap space"
        }
    }
}

out_of memory_error ... Java heap space.

What do you suggest to recover the shard from current INITIALIZING state and also, what can be done to prevent it from happen in the future?

TIA

warkolm · October 5, 2016, 8:20am

You need to look in your ES logs, that is where the shard is after all.

TeePee · October 6, 2016, 9:52am

Admirably, /var/log/elasticsearch/ is empty...

[edit]
Apparently, Elasticsearch was able to heal itself:

[2016-10-06 17:53:55,503][WARN ][index.translog           ] [tracking] [tracking-2016.09.28][3] deleted previously created, but not yet committed, next generation [translog-10.tlog]. This can happen due to a tragic exception when creating a new generation

[2016-10-06 17:54:00,760][INFO ][cluster.routing.allocation] [tracking] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[tracking-2016.09.28][3]] ...]).

Everything seems to be working fine. The shard state changed to STARTED, the index status changed to GREEN.

Topic		Replies	Views
Shards stuck in initializing status for long time Elasticsearch	4	979	July 6, 2017
Red status caused by stuck initializing_shards Elasticsearch	12	6615	July 19, 2018
Shard stuck in INITIALIZING state Elasticsearch	2	14281	June 17, 2017
Shard unassigned NODE_LEFT Elasticsearch	1	608	December 8, 2020
Elastic Search Cluster is in yellow state due to a shard in an index always stuck in INITIALIZING mode Elasticsearch	2	2194	March 5, 2017

Shard stuck in INITIALIZING

Related topics