Plusieurs problème avec mon cluster ElasticSearch

Medidou · January 31, 2020, 1:23pm

Bonjour tous,

j'ai monter un cluster Elasticsearch
2 sont dans un DC (node-1 & 2) et 1 dans un autre DC (node-3)

Voici la version de Elasticsearch sur mes nodes
la communication entre eux se fais en HTTPS

{
  "name" : "node-01",
  "cluster_name" : "cluster",
  "cluster_uuid" : "//////////////////////",
  "version" : {
    "number" : "6.8.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "65b////",
    "build_date" : "2019-05-15T20:06:13.172855Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

{
  "name" : "node-02",
  "cluster_name" : "cluster",
  "cluster_uuid" : "//////////////////////",
  "version" : {
    "number" : "6.8.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "65b////",
    "build_date" : "2019-05-15T20:06:13.172855Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

{
  "name" : "node-03",
  "cluster_name" : "cluster",
  "cluster_uuid" : "//////////////////////",
  "version" : {
    "number" : "6.8.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "65b////",
    "build_date" : "2019-05-15T20:06:13.172855Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

j'ai pas mal de problème récurent dans les logs et j'aimerais ne plus avoir de [WARN] pour avoir un cluster propre et deplus pour ne pas pollué la supervision

voici les Erreurs récurrentes

pour le premier node :

[2020-01-31T13:43:01,318][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.31][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,319][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.30][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,320][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][2]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,321][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,322][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][3]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,323][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.29][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,324][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-es-6-2020.01.28][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,324][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][1]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,325][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][4]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,326][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES][2]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,331][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-es-6-2020.01.27][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,332][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-es-6-2020.01.26][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,333][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.26][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,334][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-es-6-2020.01.25][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,335][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.25][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,336][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-kibana-6-2020.01.24][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,337][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.monitoring-es-6-2020.01.24][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,338][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][3]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,339][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][2]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,340][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,350][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [archivage-config][3]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,351][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.kibana_1][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,352][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [.kibana_task_manager][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,355][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,348][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][1]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,349][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][4]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,369][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [nom d'un index sur ES_03][2]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,371][WARN ][o.e.g.G.InternalReplicaShardAllocator] [Node-1] [searchguard][0]: failed to list shard for shard_store on node [l4lQuRv6e-BaHsOTQ]
[2020-01-31T13:43:01,465][WARN ][o.e.d.z.PublishClusterStateAction] [Node-1] publishing cluster state with version [46646] failed for the following nodes: [[{Node-3}{l4lQuRv6e-BaHsOTQ}{tx8Yiai7QZalfilMPzPj9g}{192.168.112.173}{192.168.112.173:9300}{ml.machine_memory=8339591168, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]

pour le 2eme nodes

[2020-01-31T13:41:26,401][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:41:38,959][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:41:46,505][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:41:56,570][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:42:11,709][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:42:26,746][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:42:39,325][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}
[2020-01-31T13:42:46,888][WARN ][r.suppressed             ] [Node-2] path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}

et pour finir voici les WARN du 3eme

[2020-01-31T13:39:51,694][WARN ][o.e.x.m.MonitoringService] [Node-3] monitoring execution failed
[2020-01-31T13:40:10,894][WARN ][r.suppressed             ] [Node-3] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}
[2020-01-31T13:40:40,891][WARN ][r.suppressed             ] [Node-3] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}
[2020-01-31T13:41:01,696][WARN ][o.e.x.m.MonitoringService] [Node-3] monitoring execution failed
[2020-01-31T13:41:10,894][WARN ][r.suppressed             ] [Node-3] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}
[2020-01-31T13:41:40,894][WARN ][r.suppressed             ] [Node-3] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}
[2020-01-31T13:42:10,901][WARN ][r.suppressed             ] [Node-3] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}
[2020-01-31T13:42:11,696][WARN ][o.e.x.m.MonitoringService] [Node-3] monitoring execution failed

Merci d'avance de l'aide

ahmed_charafouddine · January 31, 2020, 1:39pm

Comment as-tu désigné ton cluster ? Les rôles des noeuds ? Peux -tu nous faire connaitre plutôt la config de tes nœuds ?

Medidou · January 31, 2020, 1:54pm

node 1 :

#---------------------------------- Cluster -----------------------------------
#
cluster.name: cluster.es
#
# ------------------------------------ Node ------------------------------------
#
node.name: node_1
#
# ----------------------------------- Paths ------------------------------------
#
path.data: /var/lib/elasticsearch
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
bootstrap.memory_lock: true
#
# ---------------------------------- Network -----------------------------------
#
network.host: 192.168.111.171
#
# --------------------------------- Discovery ----------------------------------
#
#discovery.zen.ping.unicast.hosts: ["VM-1", "VM-2"]
discovery.zen.ping.unicast.hosts: ["192.168.0.172", "192.168.0.173"]
#discovery.zen.ping.unicast.hosts.resolve_timeout: 30s
discovery.zen.fd.ping_timeout: 40s
discovery.zen.fd.ping_retries: 10
discovery.zen.minimum_master_nodes: 2
#
# --------------------------------- Premium features -------------------------
#
#Disable premium features
xpack.security.enabled: false
searchguard.enterprise_modules_enabled: false

# -------------------------------- SearchGuard -------------------------------

#SSL security on the transport layer (for SG administration and inter-node communication)
searchguard.ssl.transport.pemcert_filepath: /etc/elasticsearch/config/cert/es_1.pem
searchguard.ssl.transport.pemkey_filepath: /etc/elasticsearch/config/cert/es_1.key
searchguard.ssl.transport.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/root-ca_1.pem


#Declare other nodes of the cluster
searchguard.nodes_dn:
- CN=es_2.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com
- CN=es_3.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com

#Enable hostname verification. Disable if node hostname does not match node certificate
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false

#Admin certificate declaration for Searchguard administration
searchguard.authcz.admin_dn:
- CN=admin.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com

#
#
# -------------------------------------- SSL ----------------------------
#SSL security on the REST layer (End users, Kibana, etc.)
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: /etc/elasticsearch/config/cert/toto-tata.crt
searchguard.ssl.http.pemkey_filepath: /etc/elasticsearch/config/cert/toto-tata.key
searchguard.ssl.http.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/toto-tata-root-ca.crt

node 2 :

# ---------------------------------- Cluster -----------------------------------
#
cluster.name: cluster.es
#
# ------------------------------------ Node ------------------------------------
#
node.name: node_2
#
# ----------------------------------- Paths ------------------------------------
#
path.data: /var/lib/elasticsearch
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
bootstrap.memory_lock: true
#
# ---------------------------------- Network -----------------------------------
#
network.host: 192.168.0.172
#
# --------------------------------- Discovery ----------------------------------
#
#discovery.zen.ping.unicast.hosts: ["VM-03", "VM-01"]
discovery.zen.ping.unicast.hosts: ["192.168.0.171", "192.168.0.173"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 40s
discovery.zen.fd.ping_retries: 10
#
# ------------------------------ Premium Feature
xpack.security.enabled: false
searchguard.enterprise_modules_enabled: false
#
# -------------------------------- SearchGuard -------------------------------
#
#SSL security on the transport layer (for SG administration and inter-node communication)
searchguard.ssl.transport.pemcert_filepath: /etc/elasticsearch/config/cert/es_2.pem
searchguard.ssl.transport.pemkey_filepath: /etc/elasticsearch/config/cert/es_2.toto-toto.com.key
searchguard.ssl.transport.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/root-ca.pem

#Declare other nodes of the cluster
searchguard.nodes_dn:
- CN=es_1.toto-toto.com,OU=escluster,O=ES toto,DC=toto-toto,DC=com
- CN=es_3.toto-toto.com,OU=escluster,O=ES toto,DC=toto-toto,DC=com

#Enable hostname verification. Disable if node hostname does not match node certificate
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false

#Admin certificate declaration for Searchguard administration
searchguard.authcz.admin_dn:
- CN=admin.toto-toto.com,OU=escluster,O=ES toto,DC=toto-toto,DC=com

#
#
# -------------------------------------- SSL ----------------------------
#SSL security on the REST layer (End users, Kibana, etc.)
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: /etc/elasticsearch/config/cert/toto-toto.crt
searchguard.ssl.http.pemkey_filepath: /etc/elasticsearch/config/cert/toto-toto.key
searchguard.ssl.http.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/toto-toto-root-ca.crt
```

node 3 

```
# ---------------------------------- Cluster -----------------------------------
#
cluster.name: cluster.es
#
# ------------------------------------ Node ------------------------------------
#
node.name: node_3
#
# ----------------------------------- Paths ------------------------------------
#
path.data: /var/lib/elasticsearch
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
bootstrap.memory_lock: true
#
# ---------------------------------- Network -----------------------------------
#
network.host: 192.168.0.173
#
# --------------------------------- Discovery ----------------------------------
#
#discovery.zen.ping.unicast.hosts: ["VM-3", "VM-2"]
discovery.zen.ping.unicast.hosts: ["192.168.0.171", "192.168.0.172"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 40s
discovery.zen.fd.ping_retries: 10
#
# --------------------------------- Disable Premium Feature
#
xpack.security.enabled: false
searchguard.enterprise_modules_enabled: false

# -------------------------------- SearchGuard -------------------------------

#SSL security on the transport layer (for SG administration and inter-node communication)
searchguard.ssl.transport.pemcert_filepath: /etc/elasticsearch/config/cert/es_3.pem
searchguard.ssl.transport.pemkey_filepath: /etc/elasticsearch/config/cert/es_3.key
searchguard.ssl.transport.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/root-ca.pem

#Declare other nodes of the cluster
searchguard.nodes_dn:
- CN=es_1.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com
- CN=es_2.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com

#Enable hostname verification. Disable if node hostname does not match node certificate
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false

#Admin certificate declaration for Searchguard administration
searchguard.authcz.admin_dn:
- CN=admin.toto-tata.com,OU=escluster,O=ES toto,DC=toto-tata,DC=com

#
#
# -------------------------------------- SSL ----------------------------
#SSL security on the REST layer (End users, Kibana, etc.)
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: /etc/elasticsearch/config/cert/toto-tata.crt
searchguard.ssl.http.pemkey_filepath: /etc/elasticsearch/config/cert/toto-tata.key
searchguard.ssl.http.pemtrustedcas_filepath: /etc/elasticsearch/config/cert/toto-tata-root-ca.crt
```

ahmed_charafouddine · January 31, 2020, 2:04pm

Sur ta conf, Il me semble que tu as du oublier de configurer le rôle de chaque noeud.
LIen qui pourrait t'aider : https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

Medidou · January 31, 2020, 2:32pm

Oui mais par défaut il s'auto organise pour les roles et dans les erreurs des logs c'est pas un problème de log a priori

dadoonet · February 3, 2020, 10:36am

Non. La configuration des rôles surtout pour un si petit cluster, n'est pas utile à mon avis.

dadoonet · February 3, 2020, 10:39am

Que donne:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
GET /_cat/shards?v

Merci de formatter ton code/logs pour rendre tout ça lisible. Je l'ai fait pour toi pour ta question et ta première réponse. Tu peux utiliser l'outil </> ou du markdown.

Medidou · February 3, 2020, 10:55am

Bonjour,

Merci pour vos réponses après expertise le problème c'était que ES-03 devais passé par un routeur qui fais des timeout sur des sessions de + de 2heures.

je les changer de réseau et problème résolu a suivre pour les autres problèmes si c'était lié

dadoonet · February 3, 2020, 12:57pm

A noter toutefois que nous déconseillons d'avoir des noeuds dans plusieurs zones géographiques différentes.
Si la latence entre tes noeuds est plus élevées que le réseau local, alors il vaut mieux utiliser le cross cluster replication (license commerciale).

system · March 2, 2020, 12:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Demande conseils ElasticSearch Discussions en français	9	1013	May 31, 2017
Probleme avec la commande curl Discussions en français	11	2358	August 7, 2019
Infrastructure d'Elasticsearch Discussions en français	18	3551	July 6, 2017
Découverte de Elasticsearch Discussions en français	8	1307	July 6, 2017
[KIBANA] Impossible de créer un index pattern Discussions en français	7	1810	July 2, 2019

Plusieurs problème avec mon cluster ElasticSearch

Related topics