We are separating our cluster into data and client nodes and plan to have client nodes behind a TCP load balancer so that applications can connect to the cluster via TransportClient. We have multiple applications, each with multiple hosts, persisting data at different rate into the same cluster.
Is it a good approach to route requests via TCP load balancer or can there be any concerns?
Just a note. We are moving to the REST Client which means that Transport layer will be deprecated for a client usage in the future.
You should start to consider load balancing the REST layer instead.
Is it a good approach? Yes. I don't see anything bad with that.
JSON-serialized data has greater size in bytes than binary serialized data
More time is needed to [de]serialize to JSON then to binary
And it is strange that Elasticsearch 5.x has no NodeClient. We used it to send data directly to shards instead of sending to each node in round robin manner
@dadoonet I searched about TCP load-balancing more and found this - Transport client makes 14 connections to a host and it's possible that the load-balancer may not map all the 14 connections to the same host behind the load-balancer but I guess the transport client assumes that it will be connected to the same host with 14 connections, please correct me if I'm wrong?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.