Replacing an Elasticsearch node

Hello,

I have a cluster with 5 nodes (3 master nodes and 2 data nodes) hosted on vms.
I am using Elasticsearch version 7.17.8.

I need to replace these VMs (to destroy them and to create new ones).
I replaced the second master without any issues (from Debian 10 to Debian 11).
But I am not able to replace the first master node. Elasticsearch does not want to start correctly and I have this error message:

[2023-02-22T12:14:03,899][ERROR][o.e.x.d.l.DeprecationIndexingComponent] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] Bulk write of deprecation logs encountered some failures: [[QlHTeIYBxfWVwofIrfeN 
UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2
023.02.22-000001][0]] containing [2] requests]], Q1HTeIYBxfWVwofIrfeQ UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0] primary shard is not active Timeout: [1m], requ
est: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0]] containing [2] requests]]]]
[2023-02-22T12:14:08,803][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{RFHTeIYBxfWVwofIwPfB=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [index {[ilm-history-5][_doc][RFHTeIYBxfWVwofIwPfB], source[{"index":".ds-.logs-
deprecation.elasticsearch-default-2023.02.22-000001","policy":".deprecation-indexing-ilm-policy","@timestamp":1677064353748,"index_age":-51,"success":true,"state":{"phase":"new","phase_definition":"{\"policy\":\
".deprecation-indexing-ilm-policy\",\"version\":1,\"modified_date_in_millis\":1677064353116}","action_time":"1677064353748","phase_time":"1677064353748","action":"complete","step":"complete","creation_date":"167
7064353799","step_time":"1677064353748"}}]}]]]}]                                                         
[2023-02-22T12:15:08,807][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{RlHUeIYBxfWVwofIq_cl=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]], R1HUeIYBxfWVwofIq_cl=UnavailableShardsException[[.ds-ilm-history
-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]], RVHUeIYBxfWVwofIq_cl=UnavailableShardsExceptio
n[[.ds-ilm-history-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]]}]
[2023-02-22T12:16:08,810][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{SFHVeIYBxfWVwofIlfeI=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [2] requests]], SVHVeIYBxfWVwofIlfeI=UnavailableShardsException[[.ds-ilm-history
-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [2] requests]]}]

Do I need to execute something before or after the migration ?
Do I need to change something in my config ?

Thanks.

What is the status of your cluster?

You can get the status making a request to the endpoint _cluster/health

How did you replaced it ? Did you excluded the node from allocation before replacing it? You created a new node or the data dir for elasticsearch is the same?

If you didn't excluded the node from allocation before destroying it and you create a new node, you may have lost some data, which is what your log is saying, that some primary shards are not available.

These nodes are only master nodes so they don't have data, right ?

To replace the second one, I just shutdown the vm and deploy a new one with the same config.

For the first one, I did the same thing and it did not work. So I restarted the old node and it is working correctly now. My cluster is in Green status.

Even for the master node I need to do something before this kind of operation ?

If they are configured as master nodes and do not have any data roles, then you are right, they will not have any index data, only the metadata for the cluster which will be stored in the path.data configured in the elasticsearch.yml.

Also, if they do not have data roles this error is unrelated. Can you confirm that they do not have any data roles? Share the elasticsearch.yml of the masters.

But you created a new master and it joined the cluster or you just create a new machine, installed Elasticsearch and used the same metadata for the node? If you reused the data on the path.data, it would be the same node, if you didn't reused, it would be created as a new node.

If they are indeed master only nodes it would be easier to add the new master eligible nodes and remove the old masters.

Do you have anything else in the logs?

bootstrap.memory_lock: false
cluster.initial_master_nodes: elasticsearch-app-dc1-02.domain.test
cluster.name: elasticsearch-infra-integ-01
discovery.seed_hosts:
- elasticsearch-app-dc1-02.domain.test:9300
- elasticsearch-app-dc2-01.domain.test:9300
- elasticsearch-app-dc3-03.domain.test:9300
- elasticsearch-db-dc1-02.domain.test:9300
- elasticsearch-db-dc2-01.domain.test:9300
http.host: 0.0.0.0
http.port: 9200
node.data: false
node.master: true
node.name: elasticsearch-app-dc1-02.domain.test
transport.host: 0.0.0.0
transport.port: 9300
xpack.security.authc.realms.file.file1.order: 0
xpack.security.authc.realms.native.native1.order: 1

#################################### Paths ####################################

# Path to directory containing configuration (this file and logging.yml):

path.data: /opt/elasticsearch/data
path.logs: /var/log/elasticsearch

action.auto_create_index: true

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: "certificate"
xpack.security.transport.ssl.key: "/etc/elasticsearch/certs/server.key"
xpack.security.transport.ssl.certificate: "/etc/elasticsearch/certs/server.crt"
xpack.security.transport.ssl.certificate_authorities: "/etc/elasticsearch/certs/domain.test-intermediate-ca.crt"
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: "/etc/elasticsearch/certs/server.key"
xpack.security.http.ssl.certificate: "/etc/elasticsearch/certs/server.crt"
xpack.security.http.ssl.certificate_authorities: "/etc/elasticsearch/certs/domain.test-intermediate-ca.crt"

No I didn't reuse the data. I just recreated a new server with the same name and same IP and reinstalled elastic.

Agree but I need to keep the same name and IP for the new servers.

Only these errors. Other logs are INFO logs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.