Thanks for the reply. I've bee hollering in the IRC channel all day!
When you say "master", is that a logical name you are giving your infrastructure or is this the Elasticsearch master-eligible / dedicated master node concept?
Originally, my second node did not join my cluster until I specified a dedicated role for each via the configuration like so:
node.data: false
node.master: true
1 as dedicated master...
node.data: true
node.master: false
and the new one as data, but I don't know if I have them configured correctly.
Then, when it finally did join, no replication occurred but it did set up the same directory structure with no indexes inside of them:
$ ls /data/elasticsearch/nodes/0/indices/logstash-2018.07.20/
drwxr-xr-x 2 elasticsearch elasticsearch 24 Jul 24 01:25 _state
Also speaking of master nodes, 2 is not a great number since you need a quorum.
My goal was to just add a node to take the load off of my crashing node, but every time I restart the node (via service), more index translogs seem to get corrupted (java.nio.file.NoSuchFileException
), and I wind up deleting indexes.
And be sure to set minimum master nodes correctly.
Both configurations are set with:
discovery.zen.minimum_master_nodes: 1
cluster.routing.allocation.enable: all
cluster.routing.rebalance.enable: all
cluster.routing.allocation.allow_rebalance: always
discovery.zen.ping.unicast.hosts: [xxx]
# discovery.zen.ping.unicast.hosts: [xxx] IP of second node
Once I see the second node rebalancing, I will add a new third node to the cluster.
Below are the current 2 node stats:
/cat/health
1532398709 02:18:29 elasticsearch yellow 2 1 6271 6271 0 0 6271 0 - 50.0%
/cat/nodes
172.29.100.223 172.29.100.223 67 98 0.24 d - metis.localdomain
172.29.100.124 172.29.100.124 41 87 0.04 - * titania.localdomain
/_cat/shards/logstash-2018.07.20
logstash-2018.07.20 1 p STARTED 5314 2.2mb 172.29.100.223 metis.localdomain
logstash-2018.07.20 1 r UNASSIGNED
logstash-2018.07.20 2 p STARTED 5213 2.1mb 172.29.100.223 metis.localdomain
logstash-2018.07.20 2 r UNASSIGNED
logstash-2018.07.20 3 p STARTED 5268 2.2mb 172.29.100.223 metis.localdomain
logstash-2018.07.20 3 r UNASSIGNED
logstash-2018.07.20 4 p STARTED 5229 2.2mb 172.29.100.223 metis.localdomain
logstash-2018.07.20 4 r UNASSIGNED
logstash-2018.07.20 0 p STARTED 5174 2.2mb 172.29.100.223 metis.localdomain
logstash-2018.07.20 0 r UNASSIGNED
And all the replicas are CLUSTER_RECOVERED
if you /explain
them.
I'm happy to remove the master/data config to let it config itself, but since it takes forever to recover the index, I'd like to know if anyone notices something wrong?