We used to have just a single node in our cluster. In the past 2 weeks, we added 2 more nodes. Now we want to switch off the old node because the machine is old. But I am doubtful that all our data has actually been spread out to the other machines in the cluster.
I can't find a post that explains how to confirm this.
I believe that doing a _cat/shards will tell me. Here is the output from that:
$ curl -XGET 'localhost:9200/_cat/shards/our_index_1,our_index_2,our_index_3?pretty'
our_index_1 1 r STARTED 28380731 17.3gb machine-02-ip elasticsearch-02
our_index_1 1 p STARTED 28380731 17.3gb machine-03-ip elasticsearch-03
our_index_1 3 r STARTED 28378851 17gb machine-03-ip elasticsearch-03
our_index_1 3 p STARTED 28378848 17gb machine-01-ip elasticsearch-01
our_index_1 2 r STARTED 28385815 17.5gb machine-03-ip elasticsearch-03
our_index_1 2 p STARTED 28385815 17.4gb machine-01-ip elasticsearch-01
our_index_1 4 r STARTED 28370118 17gb machine-02-ip elasticsearch-02
our_index_1 4 p STARTED 28370114 16.9gb machine-03-ip elasticsearch-03
our_index_1 0 r STARTED 28378628 16.9gb machine-02-ip elasticsearch-02
our_index_1 0 p STARTED 28378628 16.9gb machine-01-ip elasticsearch-01
our_index_2 0 p STARTED 2339117 1.4gb machine-03-ip elasticsearch-03
our_index_2 0 r STARTED 2341647 1.5gb machine-01-ip elasticsearch-01
our_index_3 1 r STARTED 1928 8mb machine-03-ip elasticsearch-03
our_index_3 1 p STARTED 1928 8.5mb machine-01-ip elasticsearch-01
our_index_3 3 r STARTED 1965 6.7mb machine-03-ip elasticsearch-03
our_index_3 3 p STARTED 1965 7.5mb machine-01-ip elasticsearch-01
our_index_3 2 r STARTED 2011 7mb machine-02-ip elasticsearch-02
our_index_3 2 p STARTED 2011 8.5mb machine-03-ip elasticsearch-03
our_index_3 4 p STARTED 2049 6.5mb machine-02-ip elasticsearch-02
our_index_3 4 r STARTED 2049 6.6mb machine-03-ip elasticsearch-03
our_index_3 0 r STARTED 1956 7.7mb machine-02-ip elasticsearch-02
our_index_3 0 p STARTED 1956 9.2mb machine-01-ip elasticsearch-01
It looks like all shards are on 2 machines, but no shards are on 3. So in theory if I turn off machine 1 all data will only be on 1 machine (either machine 2 or machine 3).
Is this correct? I am not sure it is what we want. Is this a configuration issue?
There's a heap on the 3rd node, here's just some of them.;
our_index_1 1 p STARTED 28380731 17.3gb machine-03-ip elasticsearch-03
our_index_1 3 r STARTED 28378851 17gb machine-03-ip elasticsearch-03
our_index_1 2 r STARTED 28385815 17.5gb machine-03-ip elasticsearch-03
If you cluster is green, and every index has a single replica set, then you are fine.
To follow up on this. I thought that to 'turn off' one of the nodes (machine-01, which happened to be the master node) I simply needed to change cluster.initial_master_nodes and discover.seed_hosts on machine-02 to be itself as opposed to being machine-01. machine-03 already has machine-02 as the initial master and seed host.
But when I do that, machine-02 is no longer part of the cluster:
For a small cluster its finde to set all nodes to node.master: true and register them with cluster.initial_master_node. By this they become master eligible which helpes with turning off and on nodes. This is the setup that works best for me.
You should not be changing cluster.initial_master_nodes after the cluster has formed. Simply remove it entirely from the config file. Quoting the docs:
You should not use this setting when restarting a cluster or adding a new node to an existing cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.