Best way to insert data into Elasticsearch

When inserting data into elasticsearch with Logstash (Where logstash is doing the parsing), is it better to:

  • Point all incoming index requests to any node (i.e. doesn't really matter what node, as long as it gets to the right cluster
  • Point all incoming index requests to any master node (same as above, but only master nodes, not data nodes)
  • Point all incoming index requests to all nodes (Logstash's Elasticsearch output plugin can take an array of ES hosts - so, this would be including them all)
  • Something else?

Thanks

Send requests to any of the data nodes. Dedicated master nodes should not serve traffic.

If I have 5 data nodes, is it better to send all insert requests to 1 node or spread it across all 5 with a load balancer external to ES?

It'd be better to round robin it, yeah.

So what would be the difference (if any) in:

  • Using an external Load Balancer in front of the nodes to Round Robin across them evenly
  • Using the functionality for Logstash's elasticsearch output plugin to accept multiple hostnames, and putting all of the nodes in that list instead of pointing it at the load balancer.

Not much. Do whatever you are more comfortable with. The load balancer option will let you change the list without doing anything to logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.