Hi,
Elasticsearch acts as the data center of our system. Other applications, which hold one "Transport Client" in the application itself, are doing query actions from the Elasticsearch.
ES version: 1.5.2
Our cluster have 30 data nodes and 3 master nodes.
My questions:
With the growth of applications that hold "Transport Client" , the Elasticsearch cluster will have to hold more and more tcp connections.
How many transport clients can a cluster have? Is there any proper value for the limit of "Transport Client"?
The more "Transport Client", the more pressure that the Elasticsearch has?
The transport client is a normal client, it doesn't join the cluster or anything like that. That said it is recommended that the transport client is a singleton in your application, it is thread-safe hence the same instance should be used for all your requests to elasticsearch. The problems that may arise from using more instances are more on the client side though than on the server side. Maybe too many connection handles also on the server side.
Yes. The transport client are singleton in our application.
Hence, this problem doesn't exist in our application:
"The problems that may arise from using more instances are more on the client side though than on the server side."
Now the number of tcp connections in our Elasticsearch server are over 2K and this number is not a problem. What's the number of tcp connections that may affect the Elasticsearch server performance?
I wonder why you are asking this question if your usage of transport client is the preferred one. Also, how do you monitor the number of tcp connections?
I wouldn't know what number of tcp connections could cause problems on the server side, that would also depends on how much resources you have and how loaded the cluster is.
The reason why I ask this question is that:
As I mentioned above, ES acts as the data center of our system. Many applications will query data from our ES. Therefore, we have two ways for applications to query data from ES.
Strategy 1. Each application hold transport client and query data from ES.
Strategy 2. Only several internal applications hold transport clients and provide query services for others applications to invoke.
I prefer the first one. Therefore, I ask this question. If the number of tcp connections won't be a problem, this will make me have confidence in Strategy 2.
In addition, “how do you monitor the number of tcp connections?”
I monitor this through the url "_nodes/stats" and this metric "network.tcp.curr_estab"
I don't fully understand what the difference is between strategy 1 and 2. Can you quantify? How many instances of transport client with one and the other? Did you have the chance to try both and see the difference in the monitored number of tcp connections?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.