Elasticsearch master node times out during cluster recovery stage due to heavy GC

hrishikesh_prabhune · June 12, 2014, 10:54pm

I have a huge elasticsearch cluster with 18 nodes each having a jvm heap
size of 25gb. The cluster holds total of 182 timestamped indices. Each
index has approximately 1500 aliases.
Whenever I do a full cluster restart the master node goes into massive
garbage collection and is not able to recover indices into the cluster
state. As a result the cluster state shows:
"cluster_name" : "ES_cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 18,
"number_of_data_nodes" : 18,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

After sometime the ping timeouts kicks in and the master node is dropped
from the cluster and is unable to rejoin it again. After it has dropped
from the cluster a new master is elected and then it also goes into massive
garbage collection. It seems that too many aliases are causing the cluster
state to explode.
I think my problem is similar to this question:
http://elasticsearch-users.115913.n3.nabble.com/Multiple-cluster-state-copies-in-memory-VS-many-aliases-td4046417.html

I am using ES version : 1.2.1 , Java version : 1.8.0_05

Does anyone know how to get around the 'too many aliases' problem without
deleting the aliases?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/072f84f1-f12d-42ab-809d-8a7d578be18d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kimchy · June 15, 2014, 10:56pm

Heya, I tried to simulate what you are having, and there is a quick fix that makes a big difference, see more here: Better default size for global index -> alias map by kimchy · Pull Request #6504 · elastic/elasticsearch · GitHub. There are additional optimizations that we can make, but those will be a bit more complex.

On Jun 13, 2014, at 0:54, hrishikesh prabhune hruship@gmail.com wrote:

I have a huge elasticsearch cluster with 18 nodes each having a jvm heap size of 25gb. The cluster holds total of 182 timestamped indices. Each index has approximately 1500 aliases.
Whenever I do a full cluster restart the master node goes into massive garbage collection and is not able to recover indices into the cluster state. As a result the cluster state shows:
"cluster_name" : "ES_cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 18,
"number_of_data_nodes" : 18,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

After sometime the ping timeouts kicks in and the master node is dropped from the cluster and is unable to rejoin it again. After it has dropped from the cluster a new master is elected and then it also goes into massive garbage collection. It seems that too many aliases are causing the cluster state to explode.
I think my problem is similar to this question:
http://elasticsearch-users.115913.n3.nabble.com/Multiple-cluster-state-copies-in-memory-VS-many-aliases-td4046417.html

I am using ES version : 1.2.1 , Java version : 1.8.0_05

Does anyone know how to get around the 'too many aliases' problem without deleting the aliases?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/072f84f1-f12d-42ab-809d-8a7d578be18d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7DA18C18-42F5-4977-8052-8F7C9DBA7024%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Aliase creation timeout, when node leaves cluster - ES 2.4.6 Elasticsearch	5	474	February 11, 2019
Timeout Elasticsearch	4	900	July 6, 2017
Multiple cluster state copies in memory VS many aliases Elasticsearch	1	307	July 6, 2017
Solutions for master timeouts while creating indices Elasticsearch	6	3832	April 26, 2019
Huge ES query logging issue Elasticsearch	1	303	November 19, 2020

Elasticsearch master node times out during cluster recovery stage due to heavy GC

Related topics