How to make nodes forget about the old/previous cluster?

I had (and still have) a running 3-node cluster C1. C1 has all 3 nodes (es0,es1,es2) as master nodes. Then, accessible within the same network, I wanted to set up a second cluster C2 (nodes es3,es4,es5), to which I would stream S3 snapshots, sourced from C1.

On first bootstrap, in elasticsearch.yml I mistakenly left cluster.name of C2 equal to C1's. But even after changing cluster.name of C2 to some other name, and restarting with service elasticsearch restart, C2 still cannot form a quorum and insists on waiting for votes from C1's nodes. Here's the relevant log from es4, from /var/log/elasticsearch/<C2_corrected_name>_server.json:

{
"cluster.name" : "<C2_corrected_name>",
"component" : "o.e.c.c.ClusterFormationFailureHelper",
"level" : "WARN",
"message" : "master not discovered or elected yet, an election requires at least 2 nodes with ids from
[Khkwo1UuSviqddELYnjxSw, YYrzUrOyS4Woe2Q_red0Fg, gP7n1qgQS5eDqzvvVQhKrw],

have only discovered non-quorum
[{es4}{8zpt-krUQN-SZt2hWl8bwA}{zS8t6lFTJSII-X3WzANAQ}{10.0.31.204}{10.0.31.204:9300}{cdfhilmrstw}, {es3}{0kVuUzneQamppCJtBZz75g}{Fnz4ga-dSuW9p7MSFapxLw}{10.0.9.20}{10.0.9.20:9300}{cdfhilmrstw}, {es5}{VfcgGh9FTUK85E1Xf8-NQ}{x3v3gEDXTla_WBY9FPCHGw}{10.0.37.24}{10.0.37.24:9300}{cdfhilmrstw}];
discovery will continue using [10.0.9.20:9300, 10.0.37.24:9300] from hosts providers and [{es4}{8zpt-krUQN-SZt2hWl8bwA}{_zS8t6lFTJSII-X3WzANAQ}{10.0.31.204}{10.0.31.204:9300}{cdfhilmrstw}] from last-known cluster state; node term 5, last-accepted version 379 in term 5",
"node.name" : "es4",
"timestamp" : "",
"type" : "server"
}

All 6 nodes are of version 7.17.4. Ubuntu 18.04.6

How can I make C2's nodes (es3, es4, es5) forget about C1's nodes? Is elasticsearch-node the right tool?

Thanks.

This is (roughly) the situation described in this section of the docs which also describes the remedy: delete the contents of their data paths.

Thanks David. Here's what I did to now form a cluster between es3-5 nodes mentioned in OP.

service elasticsearch stop
rm -rf {path.data}/*
service elasticsearch start

I'd appreciate you confirming whether this is the ideal solution.

Also, thanks for insisting that elasticsearch-node should only be used as a last resort. Do I understand correctly that this is the preferred order to approach things:

  1. if you have snapshots, then delete data-paths and restore from snapshots
  2. if you do not have snapshots, only then risk using elasticsearch-node as there's no other, third option anyway

What I'm not clear on is, doesn't option 1. guarantee data loss between 'now' and 'time-of-last-snapshot', as also pointed out by Thomas in the same thread?

I initially, probably naively, hoped there'd be some way to surgically excise 'stale' nodes (in my case, es0,es1,es2 above) from {path.data}/nodes/0/_state/.

It depends. Do they actually hold any data yet? If not, no problem. If they do, you have a few options. For instance you could stop ingestion and take a snapshot so that there is no difference between "now" and "time-of-last-snapshot".

Nope, sorry, there's no user-serviceable parts inside the data directory. See these docs for instance:

WARNING: Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.