Elasticsearch - major design flaw with node ID

Elasticsearch seems to have a major design flaw when it comes to being deployed in a docker swarm (clustered).

Each Elasticsearch node is identified by a randomly generated node ID.

This node ID is stored in the database (not easily accessible/editable).

If the Elasticsearch node goes down (for whatever reason) a new Elasticsearch node ID is generated when the node is brought back up. This makes the old node ID along with all the associated data void.

In other words, a completely new node (and/or whole Elasticsearch cluster of nodes has to be created if a single node goes down) – All previous data is lost…

Has anyone been able to figure out how to provide a static Elasticsearch node ID? Or how to get Elasticsearch to use the node Name instead of the node ID to identify an Elasticsearch node? Ideally this would be a simple yml configuration parameter.

I don't think this is an Elasticsearch issue or flaw, but seems related on how you are deploying it.

The node id and other Elasticsearch data are stored in the data path, when using containers this data path needs to be a persistent store, if a node goes down and the system spins up another node to take its place, it will be a completely new node and this is expected

To use the data from the node that went down you need to bring the node up or configure a new container to use its persisted data folder.

Also, if you have replicas, the data on the node that went down will be recreated in the new nodes.

If i'm not wrong, the node id is only generated when you spin a Elasticsearch instance for the first time with an empty data directory, if you use the same data directory on another instance, it will reuse the node id and its data.

How are you persisting the data path?

Just to reinforce what @leandrojmp said: you can't lose the node ID without losing all the other data on the node too. Exposing the node ID as a config parameter wouldn't fix that.

1 Like

Hi Leandro,

Thanks for taking the time to respond.

Yes, when an Elasticsearch docker container instance is spun up an Elasticsearch Node ID is generated. This Node ID is stored (persisted) in the data path.

If the container goes down, a new container is spun up (automatically) with a new Elasticsearch Node ID. This node ID does not match the Node ID persisted in the data path (It does NOT use the existing Node ID stored in the data path). The Elasticsearch instance continues to reboot. Effectively the whole cluster has to be recreated because the Elasticsearch instance is no longer recognised in the cluster…

As far as I know, there is no way of setting the Node-ID value so it remains constant (if the container goes down)

I can post the deployment yml file if that is helpful, but it will be tomorrow now…

Again, thanks for looking at this…

You will need to post your config files and also some logs to help understand what your issue is.

Each Elasticsearch node has a unique node id that can't be changed and it is not exposed by any configuration setting, this node id is created the first time the Elasticsearch runs and join a cluster, this node id is also persisted in the data path specified in the setting path.data in elasticsearch.yml.

Since you are using containers, this path.data needs to be an external volume that will be mounted in your container, it cannot be an ephemeral storage inside the container, but I'm assuming that you are already using external volumes.

I do not use Docker much, but from your description, when your Elasticsearch node container goes down, docker swarm will automatically spin up another container, since this is done in an automated way this seems to be a completely new container, using a differente path.data, which is in the end a completely new Elasticsearch node.

What will happen after this container is started depends on how your cluster is configured.

If you have at least 3 master node and your indices have replicas, this new container should be able to join the cluster and the data that was in the old container will be recreated in this new one from the replicas.

If you have a 2 node cluster or a single-node cluster, then when your container goes down, your entire cluster will go down and there is no cluster for this new container to join, you need to recover the cluster before anything else, this implies and bringing back the old container or using its data path in a new one.

So, you need to share how your cluster is configured, but from your description, Elasticsearch is working as expected and the issue is related to the way you are deploying it.

This seems to be the fundamental misunderstanding. Elasticsearch will only create a new node ID after making absolutely sure that there isn't one in the data path. It's quite strict about this: if it looks like there might be one there but it can't read it for some reason (permissions, file corruption etc) then it will report an error and won't create a new node ID. Since you're seeing it create a new node ID, there must be absolutely nothing that looks like a node ID in its data path.

1 Like

The deployment configuration is below.
Some notes:-
The data path is persisted to a local volume. The cluster works fine (read/write) so I assume the Node ID is appropriately stored.
The issue can easily be recreated by scaling the Elasticsearch node replica down and up (I’ll confirm tomorrow and get logs)

version: "3.3"

services:

Elasticsearch:
image: lcl02164.test.softwaregrp.net/efk/elasticsearch-7-16-1
logging:
driver: "json-file"
options:
max-size: 30m
environment:
- discovery.type=zen
- node.name=node-01
- network.host=0.0.0.0
- cluster.name=k-test-cluster
- discovery.seed_hosts=es_2,es_3
- cluster.initial_master_nodes=node-01,node-02,node-03
- bootstrap.memory_lock=false
- "ES_JAVA_OPTS=-Xms16384m -Xmx16384m"
- xpack.security.enabled=true
- ELASTIC_PASSWORD=changeMe
ports:
- "9200:9200"
- 9300
networks:
- private
volumes:
- /opt/mount1/Elasticsearch/data:/usr/share/Elasticsearch/data
- /opt/mount1/Elasticsearch/logs:/usr/share/Elasticsearch/logs
deploy:
placement:
constraints: [node.hostname == lcl02165.test.softwaregrp.net]
replicas: 1

es_2:
image: lcl02164.test.softwaregrp.net/efk/elasticsearch-7-16-1
logging:
driver: "json-file"
options:
max-size: 30m

environment:
  - discovery.type=zen
  - node.name=node-02
  - network.host=0.0.0.0
  - cluster.name=k-test-cluster
  - discovery.seed_hosts=elasticsearch,es_3
  - cluster.initial_master_nodes=node-01,node-02,node-03
  - bootstrap.memory_lock=false
  - "ES_JAVA_OPTS=-Xms16384m -Xmx16384m"
  - xpack.security.enabled=true
  - ELASTIC_PASSWORD=changeMe
ports:
  - "9201:9200"
  - 9300
networks:
  - private
volumes:
  - /opt/mount1/elasticsearch/data:/usr/share/elasticsearch/data
  - /opt/mount1/elasticsearch/logs:/usr/share/elasticsearch/logs
deploy:
  placement:
    constraints: [node.hostname == lcl02446.test.softwaregrp.net]

es_3:
image: lcl02164.test.softwaregrp.net/efk/elasticsearch-7-16-1
logging:
driver: "json-file"
options:
max-size: 30m

environment:
  - discovery.type=zen
  - node.name=node-03
  - network.host=0.0.0.0
  - cluster.name=k-test-cluster
  - discovery.seed_hosts=elasticsearch,es_2
  - cluster.initial_master_nodes=node-01,node-02,node-03
  - bootstrap.memory_lock=false
  - "ES_JAVA_OPTS=-Xms16384m -Xmx16384m"
  - xpack.security.enabled=true
  - ELASTIC_PASSWORD=changeMe
ports:
  - "9202:9200"
  - 9300
networks:
  - private
volumes:
  - /opt/mount1/elasticsearch/data:/usr/share/elasticsearch/data
  - /opt/mount1/elasticsearch/logs:/usr/share/elasticsearch/logs
deploy:
  placement:
    constraints: [node.hostname == lcl02586.test.softwaregrp.net]

kibana:
image: lcl02164.test.softwaregrp.net/efk/kibana-7-12-1
logging:
driver: "json-file"
options:
max-size: 30m
environment:
ELASTICSEARCH_URL: http://elasticsearch:9200
SERVER_SSL_ENABLED: "true"
ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES: /usr/share/kibana/config/server-chain.crt
SERVER_SSL_KEY: /usr/share/kibana/config/localhost.key
SERVER_SSL_CERTIFICATE: /usr/share/kibana/config/localhost.crt
volumes:
- ./kibana/kibana.yml:/usr/share/kibana/config/kibana.yml
- ./kibana/server-chain.crt:/usr/share/kibana/config/server-chain.crt
- ./kibana/localhost.crt:/usr/share/kibana/config/localhost.crt
- ./kibana/localhost.key:/usr/share/kibana/config/localhost.key
ports:
- 5601:5601
networks:
- private
depends_on:
- Elasticsearch
- es_2
- es_3
deploy:
placement:
constraints: [node.role == manager]

networks:
private:
external: true

.......................

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.