One node cluster stuck on initializing_shards

jmngpt · May 4, 2017, 3:27pm

Hi,

I have a 2.4.4 cluster with a single server/node (esnode1) containing only one 220GB index with 1 shard and zero replicas.

The index was responding fine, but after I cleanly restarted the server (ec2 with 2cpu 4GB RAM 500GB SSD) the cluster state is stuck on red with "initializing_shards" = 1 but no CPU or disk usage (the system is idle and not swapping) for quite a while now.

I've already raised indices.recovery.max_bytes_per_sec to 50mb, and tried the instructions at Rolling Restarts | Elasticsearch: The Definitive Guide [2.x] | Elastic, without success.

Right now, I'm at a loss as to how to debug or understand this, as the logs (below) seem pretty much normal, any hints?

Edit: the behaviour described occurs if I set a 2GB heap for ES. However, with a 3GB heap the cluster status changes to green seconds after restart. Any hints on what could be causing this?

/cluster/_health is
{
  "cluster_name" : "escluster1",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 0.0
}

This is the log following a restart:

[2017-05-04 15:00:37,975][INFO ][node                     ] [esnode1] version[2.4.4], pid[2761], build[fcbb46d/2017-01-03T11:33:16Z]
[2017-05-04 15:00:37,976][INFO ][node                     ] [esnode1] initializing ...
[2017-05-04 15:00:38,534][INFO ][plugins                  ] [esnode1] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2017-05-04 15:00:38,563][INFO ][env                      ] [esnode1] using [1] data paths, mounts [[/mnt/esdata2 (/dev/xvdh1)]], net usable_space [226.3gb], net total_space [492gb], spins? [no], types [ext4]
[2017-05-04 15:00:38,563][INFO ][env                      ] [esnode1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-05-04 15:00:40,379][INFO ][node                     ] [esnode1] initialized
[2017-05-04 15:00:40,380][INFO ][node                     ] [esnode1] starting ...
[2017-05-04 15:00:40,501][INFO ][transport                ] [esnode1] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-05-04 15:00:40,506][INFO ][discovery                ] [esnode1] escluster1/sv3aHhUjSyueq5N4_w14mQ
[2017-05-04 15:00:43,565][INFO ][cluster.service          ] [esnode1] new_master {esnode1}{sv3aHhUjSyueq5N4_w14mQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-05-04 15:00:43,595][INFO ][indices.recovery         ] [esnode1] updating [indices.recovery.max_bytes_per_sec] from [40mb] to [50mb]
[2017-05-04 15:00:43,631][INFO ][http                     ] [esnode1] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-05-04 15:00:43,632][INFO ][node                     ] [esnode1] started
[2017-05-04 15:00:43,651][INFO ][gateway                  ] [esnode1] recovered [1] indices into cluster_state

Edit: switching the log level to DEBUG, with a 2GB heap, the cluster state remains "red" and I can see the following message being logged repeatedly every 30 seconds:

[2017-05-10 15:58:45,985][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:15,985][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:15,990][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 15:59:45,990][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]
[2017-05-10 15:59:45,997][DEBUG][index.shard ] [esnode1] [myIndex][0] updateBufferSize: engine is closed; skipping
[2017-05-10 16:00:15,997][DEBUG][indices.memory ] [esnode1] recalculating shard indexing buffer, total is [203.1mb] with [1] active shards, each shard set to indexing=[203.1mb], translog=[64kb]

s1monw · May 12, 2017, 7:25am

what's the output of curl -XGET 'localhost:9200/_cat/recovery?v&pretty' --> https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-recovery.html

jmngpt · May 12, 2017, 9:57am

Here's the formatted output of _cat/recovery

index myIndex
shard 0
time 112517
type store
stage translog
source_host 127.0.0.1
target_host 127.0.0.1
repository n/a
snapshot n/a
files 0
files_percent 100.0%
bytes 0
bytes_percent 100.0%
total_files 274
total_bytes 2,34847E+11
translog 0
translog_percent -1.0%
total_translog -1

jmngpt · June 1, 2017, 4:06pm

I fixed this by setting a fixed name for the node and raising the heap size to 3GB to accomodate the static segment data reported at "memory_in_bytes" : 2272548617 (2272548617 wouldn't fit a 2GB heap). The cluster status now turns green seconds after starting or restarting.

system · June 29, 2017, 4:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards stuck on initializing with ElasticSearch 2.1.1 Elasticsearch	2	1677	July 5, 2017
Shard stucked in initializing state (elasticsearch crash test) Elasticsearch	3	507	July 6, 2017
Initializing Shards Elasticsearch	3	4437	July 5, 2017
Shards Initializing Indefinitely? Elasticsearch	10	5126	October 24, 2017
Initializing shards always show 1 Elasticsearch	1	370	July 6, 2017

One node cluster stuck on initializing_shards

Related topics