I have a cluster with 5 nodes (3 master nodes and 2 data nodes) hosted on vms.
I am using Elasticsearch version 7.17.8.
I need to replace these VMs (to destroy them and to create new ones).
I replaced the second master without any issues (from Debian 10 to Debian 11).
But I am not able to replace the first master node. Elasticsearch does not want to start correctly and I have this error message:
[2023-02-22T12:14:03,899][ERROR][o.e.x.d.l.DeprecationIndexingComponent] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] Bulk write of deprecation logs encountered some failures: [[QlHTeIYBxfWVwofIrfeN
UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2
023.02.22-000001][0]] containing [2] requests]], Q1HTeIYBxfWVwofIrfeQ UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0] primary shard is not active Timeout: [1m], requ
est: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2023.02.22-000001][0]] containing [2] requests]]]]
[2023-02-22T12:14:08,803][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{RFHTeIYBxfWVwofIwPfB=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [index {[ilm-history-5][_doc][RFHTeIYBxfWVwofIwPfB], source[{"index":".ds-.logs-
deprecation.elasticsearch-default-2023.02.22-000001","policy":".deprecation-indexing-ilm-policy","@timestamp":1677064353748,"index_age":-51,"success":true,"state":{"phase":"new","phase_definition":"{\"policy\":\
".deprecation-indexing-ilm-policy\",\"version\":1,\"modified_date_in_millis\":1677064353116}","action_time":"1677064353748","phase_time":"1677064353748","action":"complete","step":"complete","creation_date":"167
7064353799","step_time":"1677064353748"}}]}]]]}]
[2023-02-22T12:15:08,807][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{RlHUeIYBxfWVwofIq_cl=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]], R1HUeIYBxfWVwofIq_cl=UnavailableShardsException[[.ds-ilm-history
-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]], RVHUeIYBxfWVwofIq_cl=UnavailableShardsExceptio
n[[.ds-ilm-history-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [3] requests]]}]
[2023-02-22T12:16:08,810][ERROR][o.e.x.i.h.ILMHistoryStore] [elasticsearch-app-dc1-02.node.infra.prx.integ.dwadm.in] failures: [{SFHVeIYBxfWVwofIlfeI=UnavailableShardsException[[.ds-ilm-history-5-2023.02.22-0000
01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [2] requests]], SVHVeIYBxfWVwofIlfeI=UnavailableShardsException[[.ds-ilm-history
-5-2023.02.22-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-ilm-history-5-2023.02.22-000001][0]] containing [2] requests]]}]
Do I need to execute something before or after the migration ?
Do I need to change something in my config ?
You can get the status making a request to the endpoint _cluster/health
How did you replaced it ? Did you excluded the node from allocation before replacing it? You created a new node or the data dir for elasticsearch is the same?
If you didn't excluded the node from allocation before destroying it and you create a new node, you may have lost some data, which is what your log is saying, that some primary shards are not available.
These nodes are only master nodes so they don't have data, right ?
To replace the second one, I just shutdown the vm and deploy a new one with the same config.
For the first one, I did the same thing and it did not work. So I restarted the old node and it is working correctly now. My cluster is in Green status.
Even for the master node I need to do something before this kind of operation ?
If they are configured as master nodes and do not have any data roles, then you are right, they will not have any index data, only the metadata for the cluster which will be stored in the path.data configured in the elasticsearch.yml.
Also, if they do not have data roles this error is unrelated. Can you confirm that they do not have any data roles? Share the elasticsearch.yml of the masters.
But you created a new master and it joined the cluster or you just create a new machine, installed Elasticsearch and used the same metadata for the node? If you reused the data on the path.data, it would be the same node, if you didn't reused, it would be created as a new node.
If they are indeed master only nodes it would be easier to add the new master eligible nodes and remove the old masters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.