Création de cluster sous docker

Bonjour, je rencontre un problème pour créer un cluster (3 noeuds) avec elasticsearch.
Je précise que j'utilise docker et je n'utilise pas les images officielles elastic pour réaliser cela.

Quand je lance ma pile de conteneur avec docker-compose, les conteneurs elasticsearch se lancent, peuvent se voir dans le réseau (j'ai effectué des tests avec curl depuis les conteneurs).
Mais le cluster ne se monte pas, chacun des noeud créé un cluster et se proclame master de ce cluster, de plus chaque cluster possède le même UUID.

Je me retrouve donc avec 3 cluster de un seul noeud au lieu de un de 3 noeuds.

Voici la configuration de mon elasticsearch.yml

cluster.name: elastic-cluster

node.name: ${SERVICE}

path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch

network.host: 0

http.port: ${ELASTICSEARCH_PORT}

discovery.zen.ping.unicast.hosts:
  - ${HOST1}
  - ${HOST2}

cluster.initial_master_nodes:
  - elastic
  - elastic-2
  - elastic-3

J'ai déjà testé :

  • le discovery.zen.ping.unicast.hosts sous forme de tableau ["host1","host2"]..
  • le discovery.seed_hosts en liste ET en tableau

Ensuite voilà la configuration de mon docker-compose

version: '2.2'
services:
  elastic:
    image: elastic-debian:test
    container_name: elastic
    mem_limit: 4000m
    mem_reservation: 4000m
    cpus: '2'
    ports:
      - ${ELASTICSEARCH_PORT}:${ELASTICSEARCH_PORT}
    volumes:
      - mydata-test:/var/lib/elasticsearch
      - elastic_log-test:/var/log/elasticsearch
    networks:
    - elk-test
    restart: unless-stopped
    environment:
      - SERVICE=elastic
      - ELASTICSEARCH_PORT=${ELASTICSEARCH_PORT}
      - HOST1=elastic-2
	  - HOST2=elastic-3

  elastic-2:
    image: elastic-debian:test
    container_name: elastic-2
    mem_limit: 4000m
    mem_reservation: 4000m
    cpus: '2'
    volumes:
      - mydata2-test:/var/lib/elasticsearch
      - elastic_log2-test:/var/log/elasticsearch
    networks:
    - elk-test
    restart: unless-stopped
    environment:
      - SERVICE=elastic-2
      - ELASTICSEARCH_PORT=${ELASTICSEARCH_PORT}
      - HOST1=elastic
	  - HOST2=elastic-3
      
  elastic-3:
    image: elastic-debian:test
    container_name: elastic-3
    mem_limit: 4000m
    mem_reservation: 4000m
    cpus: '2'
    volumes:
      - mydata3-test:/var/lib/elasticsearch
      - elastic_log3-test:/var/log/elasticsearch
    networks:
    - elk-test
    restart: unless-stopped
    environment:
      - SERVICE=elastic-3
      - ELASTICSEARCH_PORT=${ELASTICSEARCH_PORT}
      - HOST1=elastic
      - HOST2=elastic-2

  kibana:
    image: kibana-debian:test
    container_name: kibana
    mem_limit: 2000m
    mem_reservation: 1000m
    cpus: '1'
    ports:
      - ${KIBANA_PORT}:${KIBANA_PORT}
    networks:
    - elk-test
    restart: unless-stopped
    environment:
      - KIBANA_PORT=${KIBANA_PORT}
      - KIBANA_PASSWORD=${KIBANA_PASSWORD}
      - HOST_URL=${HOST_URL}
      - ELASTICSEARCH_PORT=${ELASTICSEARCH_PORT}
    
volumes:
  mydata-test:
  elastic_log-test:
  elastic_log2-test:
  elastic_log3-test:
  mydata2-test:
  mydata3-test:

networks:
  elk-test:
    driver: bridge

Le contenu de mon .env :

KIBANA_PASSWORD=password
HOST_URL=http://elk.home
KIBANA_PORT=5601
ELASTICSEARCH_PORT=9200

Voilà le résultat d'un curl depuis le conteneur elastic :

root@abce750895f3:/usr/share/elasticsearch# curl -X GET "http://elastic:9200/_cat/nodes?v&pretty"
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.16.4            4          94   0    0.10    0.05     0.14 dilm      *      elastic
root@abce750895f3:/usr/share/elasticsearch# curl -X GET "http://elastic-2:9200/_cat/nodes?v&pretty"
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.16.2            6          94   1    0.09    0.04     0.14 dilm      *      elastic-2
root@abce750895f3:/usr/share/elasticsearch# curl -X GET "http://elastic-3:9200/_cat/nodes?v&pretty"
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.16.3            6          94   1    0.15    0.06     0.15 dilm      *      elastic-3

Et pour finir voilà les logs que j'ai sur le conteneur elastic :

[2020-06-25T15:00:37,596][INFO ][o.e.t.TransportService   ] [elastic] publish_address {192.168.32.2:9300}, bound_addresses {0.0.0.0:9300}
[2020-06-25T15:00:38,622][INFO ][o.e.b.BootstrapChecks    ] [elastic] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-06-25T15:00:38,699][INFO ][o.e.c.c.Coordinator      ] [elastic] cluster UUID [kUP771ffRi6FNrtu069u4g]
[2020-06-25T15:00:40,360][WARN ][o.e.m.j.JvmGcMonitorService] [elastic] [gc][young][2][14] duration [1.1s], collections [1]/[2.1s], total [1.1s]/[4.1s], memory [129.6mb]->[119mb]/[3.9gb], all_pools {[young] [65.6mb]->[18.3mb]/[133.1mb]}{[survivor] [15.4mb]->[14.5mb]/[16.6mb]}{[old] [48.5mb]->[86.6mb]/[3.8gb]}
[2020-06-25T15:00:40,367][WARN ][o.e.m.j.JvmGcMonitorService] [elastic] [gc][2] overhead, spent [1.1s] collecting in the last [2.1s]
[2020-06-25T15:00:40,403][INFO ][o.e.c.s.MasterService    ] [elastic] elected-as-master ([1] nodes joined)[{elastic}{T-B42F0tSH25NhASqQQiZQ}{lo9vvcWXQT6CNKBg9AVFdw}{192.168.32.2}{192.168.32.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 3, version: 37, delta: master node changed {previous [], current [{elastic}{T-B42F0tSH25NhASqQQiZQ}{lo9vvcWXQT6CNKBg9AVFdw}{192.168.32.2}{192.168.32.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-06-25T15:00:40,646][INFO ][o.e.c.s.ClusterApplierService] [elastic] master node changed {previous [], current [{elastic}{T-B42F0tSH25NhASqQQiZQ}{lo9vvcWXQT6CNKBg9AVFdw}{192.168.32.2}{192.168.32.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 3, version: 37, reason: Publication{term=3, version=37}
[2020-06-25T15:00:40,866][INFO ][o.e.h.AbstractHttpServerTransport] [elastic] publish_address {192.168.32.2:9200}, bound_addresses {0.0.0.0:9200}
[2020-06-25T15:00:40,869][INFO ][o.e.n.Node               ] [elastic] started
[2020-06-25T15:00:41,787][INFO ][o.e.l.LicenseService     ] [elastic] license [6b57049c-72dc-47fd-a0b4-ec8a564af9ac] mode [basic] - valid
[2020-06-25T15:00:41,788][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [elastic] Active license is now [BASIC]; Security is disabled
[2020-06-25T15:00:41,841][INFO ][o.e.g.GatewayService     ] [elastic] recovered [4] indices into cluster_state
[2020-06-25T15:00:44,377][WARN ][o.e.m.j.JvmGcMonitorService] [elastic] [gc][6] overhead, spent [641ms] collecting in the last [1s]
[2020-06-25T15:00:44,487][INFO ][o.e.c.r.a.AllocationService] [elastic] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.security-7][0], [.kibana_1][0], [.kibana_task_manager_1][0]]]).

elastic-2 :

[2020-06-25T13:56:44,059][INFO ][o.e.t.TransportService   ] [elastic-2] publish_address {192.168.16.2:9300}, bound_addresses {0.0.0.0:9300}
[2020-06-25T13:56:45,115][INFO ][o.e.b.BootstrapChecks    ] [elastic-2] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-06-25T13:56:45,162][INFO ][o.e.c.c.Coordinator      ] [elastic-2] cluster UUID [kUP771ffRi6FNrtu069u4g]
[2020-06-25T13:56:46,704][INFO ][o.e.c.s.MasterService    ] [elastic-2] elected-as-master ([1] nodes joined)[{elastic-2}{T-B42F0tSH25NhASqQQiZQ}{VbX2MhanTHS24I2X1uQl9w}{192.168.16.2}{192.168.16.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 2, version: 22, delta: master node changed {previous [], current [{elastic-2}{T-B42F0tSH25NhASqQQiZQ}{VbX2MhanTHS24I2X1uQl9w}{192.168.16.2}{192.168.16.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-06-25T13:56:46,711][INFO ][o.e.m.j.JvmGcMonitorService] [elastic-2] [gc][young][3][14] duration [893ms], collections [1]/[1s], total [893ms]/[3.3s], memory [149.2mb]->[119.2mb]/[3.9gb], all_pools {[young] [85.3mb]->[18mb]/[133.1mb]}{[survivor] [15.2mb]->[14.4mb]/[16.6mb]}{[old] [48.6mb]->[86.6mb]/[3.8gb]}
[2020-06-25T13:56:46,717][WARN ][o.e.m.j.JvmGcMonitorService] [elastic-2] [gc][3] overhead, spent [893ms] collecting in the last [1s]
[2020-06-25T13:56:47,166][INFO ][o.e.c.s.ClusterApplierService] [elastic-2] master node changed {previous [], current [{elastic-2}{T-B42F0tSH25NhASqQQiZQ}{VbX2MhanTHS24I2X1uQl9w}{192.168.16.2}{192.168.16.2:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 2, version: 22, reason: Publication{term=2, version=22}
[2020-06-25T13:56:47,346][INFO ][o.e.h.AbstractHttpServerTransport] [elastic-2] publish_address {192.168.16.2:9200}, bound_addresses {0.0.0.0:9200}
[2020-06-25T13:56:47,349][INFO ][o.e.n.Node               ] [elastic-2] started
[2020-06-25T13:56:48,089][INFO ][o.e.l.LicenseService     ] [elastic-2] license [6b57049c-72dc-47fd-a0b4-ec8a564af9ac] mode [basic] - valid
[2020-06-25T13:56:48,091][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [elastic-2] Active license is now [BASIC]; Security is disabled
[2020-06-25T13:56:48,121][INFO ][o.e.g.GatewayService     ] [elastic-2] recovered [1] indices into cluster_state
[2020-06-25T13:56:49,966][INFO ][o.e.c.r.a.AllocationService] [elastic-2] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.security-7][0]]]).

elastic-3

[2020-06-25T15:00:37,624][INFO ][o.e.t.TransportService   ] [elastic-3] publish_address {192.168.32.4:9300}, bound_addresses {0.0.0.0:9300}
[2020-06-25T15:00:38,662][INFO ][o.e.b.BootstrapChecks    ] [elastic-3] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-06-25T15:00:38,718][INFO ][o.e.c.c.Coordinator      ] [elastic-3] cluster UUID [kUP771ffRi6FNrtu069u4g]
[2020-06-25T15:00:39,126][INFO ][o.e.c.s.MasterService    ] [elastic-3] elected-as-master ([1] nodes joined)[{elastic-3}{T-B42F0tSH25NhASqQQiZQ}{l7XNbU3KTQysNBR2ymDMAQ}{192.168.32.4}{192.168.32.4:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 3, version: 27, delta: master node changed {previous [], current [{elastic-3}{T-B42F0tSH25NhASqQQiZQ}{l7XNbU3KTQysNBR2ymDMAQ}{192.168.32.4}{192.168.32.4:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-06-25T15:00:39,927][INFO ][o.e.m.j.JvmGcMonitorService] [elastic-3] [gc][young][2][14] duration [708ms], collections [1]/[1.7s], total [708ms]/[3.2s], memory [120.7mb]->[113.1mb]/[3.9gb], all_pools {[young] [60.5mb]->[17.4mb]/[133.1mb]}{[survivor] [11.5mb]->[9.7mb]/[16.6mb]}{[old] [48.7mb]->[86.8mb]/[3.8gb]}
[2020-06-25T15:00:39,933][INFO ][o.e.m.j.JvmGcMonitorService] [elastic-3] [gc][2] overhead, spent [708ms] collecting in the last [1.7s]
[2020-06-25T15:00:40,138][INFO ][o.e.c.s.ClusterApplierService] [elastic-3] master node changed {previous [], current [{elastic-3}{T-B42F0tSH25NhASqQQiZQ}{l7XNbU3KTQysNBR2ymDMAQ}{192.168.32.4}{192.168.32.4:9300}{dilm}{ml.machine_memory=16797630464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 3, version: 27, reason: Publication{term=3, version=27}
[2020-06-25T15:00:40,389][INFO ][o.e.h.AbstractHttpServerTransport] [elastic-3] publish_address {192.168.32.4:9200}, bound_addresses {0.0.0.0:9200}
[2020-06-25T15:00:40,393][INFO ][o.e.n.Node               ] [elastic-3] started
[2020-06-25T15:00:41,269][INFO ][o.e.l.LicenseService     ] [elastic-3] license [6b57049c-72dc-47fd-a0b4-ec8a564af9ac] mode [basic] - valid
[2020-06-25T15:00:41,271][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [elastic-3] Active license is now [BASIC]; Security is disabled
[2020-06-25T15:00:41,303][INFO ][o.e.g.GatewayService     ] [elastic-3] recovered [1] indices into cluster_state
[2020-06-25T15:00:43,459][INFO ][o.e.c.r.a.AllocationService] [elastic-3] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.security-7][0]]]).

Voilà, normalement, j'ai joint tous les éléments nécessaire comprenant config ainsi que logs.
Je remercie d'avance toute personne pouvant m'éclairer pour enfin pouvoir faire ce cluster, si il manque des infos dites le moi et je les ajouterai le plus rapidement possible.

Merci.

Cordialement,
Benjamin

-->

Bonjour Benjamin,

Si elasticsearch.yml a vraiement les settings de discovery avec plusieurs initial_master_nodes, il n'est pas possible qu'un seul noeuds fasse une election d'un master seul donc le plus probable est que elasticsearch.yml ne soit pas avec les bonnes valeurs ou qu'il y ai un problemes avec la substitution par variable d'environment dans le containers.
A noter que a chaque teste il faudrait supprimer le contenu des volumes data pour repartir a zéro...

Avec les memes parametres en 7.8.0, si je démare qu'un des 3 containers, j'ai :

es0              | {"type": "server", "timestamp": "2020-06-25T15:56:09,248Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "cluster0", "node.name": "es0", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es0, es1, es2] to bootstrap a cluster: have discovered [{es0}{NpwAJ23iQvuDopQlW1Zuzw}{UGGBxLecTcaQvksnuPK7gA}{172.18.0.2}{172.18.0.2:9300}{dimrt}{xpack.installed=true, transform.node=true}]; discovery will continue using [] from hosts providers and [{es0}{NpwAJ23iQvuDopQlW1Zuzw}{UGGBxLecTcaQvksnuPK7gA}{172.18.0.2}{172.18.0.2:9300}{dimrt}{xpack.installed=true, transform.node=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

Si elasticsearch.yml vient de l'image, peut-etre il faut vérifier dans le container le fichier elasticsearch.yml directement ? Attention yaml l'indentation est importante.
Aussi dans un premier temps tester sans variable d'environement en passant elasticsearch.yml par volume pour vérifier si ca marche en lancant un seul container qui devrait montrer l'erreur (docker-compose up -d service && docker-compose logs -f service | grep master)

Aussi, je conseillerai de commencer avec les images officielles comme point de départ (elle peut etre personnalisée, ca permet de réutiliser les scripts entrypoint par exemple et aussi l'utilisation sera plus facilement liée avec la documentation des images)