Logstash output to ES cluster - sharding?

Hi all,

My cluster currently involves:
Machine 1 : ES - co-ordinating node + Kibana
Machine 2 : ES - master + data + Logstash
Machine 3 : ES - master + data + Logstash
Machine 4 : ES - master + data + Logstash
Sharding : 1 primary per index, 1 replica, Logstash creates monthly indices.

The 3 Logstash instances (Machine 2, Machine 3, Machine 4) are set to pull from Kafka - which nodes should I set the ES output to?
I have come across articles stating to add all the ES-data-eligible nodes to this list.

My question is:
With 1 primary shard per index, what happens when a document is sent to the node containing the replica shard for that index?
What happens when I add another data-only ES+Logstash node to the cluster? What happens when I add another mast+data-ES + Logstash node? Do I include or exclude these nodes from all Logstash outputs?

Would it be better to send all Logstash outputs to the co-ordinating node instead?

Thank you!

data nodes is the way to go. If the primary shard is not on the node, that the client sends the document to, it will be rerouted internally.

one of the ideas here is that you do not need to worry about topology. As a user you would like to have a URL (or a list of URLs) to connect, but you dont care if your cluster is three node or a hundred.

So, why data nodes instead of coordinating nodes? There is a probability if you hit a data node, that the primary shard is local, so no forwarding needed, whereas the coordinating node will always have to forward. Also, you would sent all your data to a single coordinating node, instead of spreading the load across several data nodes in this setup.

Hope that makes sense!

--Alex

That makes perfect sense, thank you so much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.