My cluster currently involves:
Machine 1 : ES - co-ordinating node + Kibana
Machine 2 : ES - master + data + Logstash
Machine 3 : ES - master + data + Logstash
Machine 4 : ES - master + data + Logstash
Sharding : 1 primary per index, 1 replica, Logstash creates monthly indices.
The 3 Logstash instances (Machine 2, Machine 3, Machine 4) are set to pull from Kafka - which nodes should I set the ES output to?
I have come across articles stating to add all the ES-data-eligible nodes to this list.
My question is:
With 1 primary shard per index, what happens when a document is sent to the node containing the replica shard for that index?
What happens when I add another data-only ES+Logstash node to the cluster? What happens when I add another mast+data-ES + Logstash node? Do I include or exclude these nodes from all Logstash outputs?
Would it be better to send all Logstash outputs to the co-ordinating node instead?