Crash régulier

cko · January 10, 2019, 6:02pm

Hello

Depuis quelque temps, je rencontre des soucis sur mon cluster Elasticsearch que je ne parviens pas à résoudre.

J'ai 5 nodes dans mon cluster :

64 Go de ram ( 32 dédiées à la heap)
CPU 2x20 core
HDD en 7200 RPM sur 3 des 5 machines
HDD en 10K RPM pour 2 des 5 machines

J'ai environ 130 indices et 1200 shards.
Chaque indice fait entre 200 Mo et 1 To.

Il arrive très régulièrement (tout les jours à vrai dire) qu'un node plante totalement. Je suis obligé de lancer un reboot, impossible de restart le service.
A la suite de cela, le recovery est très long (jusqu’à 12 heures) ...

Voici le type de log que j'ai régulièrement :

[2019-01-10T03:38:49,281][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-p2] [gc][37497] overhead, spent [501ms] collecting in the last [1.3s]
[2019-01-10T04:45:20,341][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-p2] [gc][41385] overhead, spent [425ms] collecting in the last [1.1s]
[2019-01-10T04:54:37,939][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-p2] [gc][41930] overhead, spent [419ms] collecting in the last [1.4s]

Ou encore :

019-01-10T17:35:26,074][WARN ][o.e.c.s.ClusterService   ] [elasticsearch-p2] cluster state update task [zen-disco-receive(from master [master {elasticsearch-p3}{fneleMCmS3WT40g9CRVDtw}{sFFOLVQ1Qpq70Urokdeg6w}{10.5.10.13}{10.5.10.13:9300}{rack=elasticsearch-p3} committed version [108378]])] took [58.3s] above the warn threshold of 30s
[2019-01-10T17:38:36,360][WARN ][o.e.c.s.ClusterService   ] [elasticsearch-p2] cluster state update task [zen-disco-receive(from master [master {elasticsearch-p3}{fneleMCmS3WT40g9CRVDtw}{sFFOLVQ1Qpq70Urokdeg6w}{10.5.10.13}{10.5.10.13:9300}{rack=elasticsearch-p3} committed version [108379]])] took [3.1m] above the warn threshold of 30s

Avez-vous des pistes ou des recommandations pour orienter mes recherches et résoudre ce problème ?

Merci par avance pour votre aide
Cordialement

dadoonet · January 10, 2019, 6:22pm

Tu as sans doute trop de shards par noeud.

Regarde

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Et https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

cko · January 11, 2019, 4:20pm

Hello,

Merci pour ton retour. Je vais creuser de ce coté

Cordialement

Topic		Replies	Views
Elasticsearch node crashed Elasticsearch	5	801	August 3, 2022
ElasticSearch crashes in single node cluster- Issue #1 Elasticsearch	20	2995	June 12, 2019
Elasticsearch - Poor cluster performance and stability Elasticsearch	8	1407	July 18, 2019
Elasticsearch garbage collection problem Elasticsearch	8	2929	May 25, 2018
Out Of Memory crash, few documents & load Elasticsearch	10	2435	November 11, 2021

Crash régulier

Related topics