Connection pool

ton · June 11, 2015, 8:13am

Hi, is there any configuration for connection pool in elastic ?,
we are just moving from oracle to elastic.
it seems that we get low performance as we increase the client threads.

Christian_Dahlqvist · June 11, 2015, 8:48am

Could you please provide some additional information about the set-up and configuration of your cluster as well as how you are interacting with the cluster? This would help us troubleshoot any issues you are having.

ton · June 11, 2015, 11:31am

We use the basic Elastic configuration with the following changes:
index.merge.scheduler.max_thread_count: 1 (as we don’t use SSD)
heap zise = 20GB
mlockall=true

We created a new index with 8 shards and 1 replica. We indexed 50M documents and started running JMeter REST API calls with filters. JMeter is running with 10 threads (that’s the optimized number we found for this test). We notice that with 2 nodes we get about 450 calls per sec with 3 nodes we get 600 calls per sec but when we add a 4th node the throughput remains at 600 per second and doesn’t increase.

The JMeter client doesn’t reach its memory / CPU / Network limits at all.

Christian_Dahlqvist · June 11, 2015, 5:21pm

Are you sending queries to all of the nodes? When you add nodes, have to tested increasing the number of threads?

If the application is query intensive as in your benchmark, it may, depending on the size of your data, help to increase the number of replicas so there are more shards available that can serve data. If you have a small data set that can fit in memory, you even want to go as far as setting the number of replicas so that all nodes hold all the data.

mosiddi · June 11, 2015, 5:40pm

do u have configured query node? That will load balance the requests to data nodes.. given shards allocation is balanced

javadevmtl · June 11, 2015, 5:53pm

Yeah make sure JMeter is load balancing to all the nodes.

To do that just simply create a CSV data set with list of your nodes. So...

host1,9200
host2,9200
host3,9200
host4,9200

Under variable name of csv data set put something like: host,port
Starting mode should be: All Threads

And then in your HTTP sampler just reference the variables ${host} and ${port}. Each thread will cycle through the CSV data set and use a different host per request.

ton · June 14, 2015, 11:59am

no, we only send to the master node.
yes we increase the number of threads but not getting more throughput.

ton · June 14, 2015, 12:29pm

how do i configure the query node ?

Christian_Dahlqvist · June 14, 2015, 1:44pm

If you have a dedicated master node, it should be left to manage the cluster, and should not serve traffic. You should as suggested set up your JMeter to connect and send requests to all data nodes directly.

ton · June 14, 2015, 2:19pm

Ok. did that,
i have now very high load average in all nodes, twice the number of cores (8 cores)
please advice,
maybe it is cache setting ?

mosiddi · June 15, 2015, 8:29am

query node is the one where is data and is master is both set to false.

client nodes are smart load balancers that take part in some of the processing steps. Lets take an example:

We can start a whole cluster of data nodes which do not even start an HTTP transport by setting http.enabled to false. Such nodes will communicate with one another using the transport module. In front of the cluster we can start one or more "client" nodes which will start with HTTP enabled. These client nodes will have the settings node.data: false and node.master: false. All HTTP communication will be performed through these client nodes.

These "client" nodes are still part of the cluster, and they can redirect operations exactly to the node that holds the relevant data without having to query all nodes. However, they do not store data and also do not perform cluster management operations. The other benefit is the fact that for scatter / gather based operations (such as search), since the client nodes will start the scatter process, they will perform the actual gather processing. This relieves the data nodes to do the heavy duty of indexing and searching, without needing to process HTTP requests (parsing), overload the network, or perform the gather processing.

ton · June 15, 2015, 9:15am

Ok, thanks i will do it
will it resolve the high io wait we expirience ?

Topic		Replies	Views
Understanding Threadpools Elasticsearch	7	436	July 6, 2017
Connection Pool Limit in Elastic Search Server or Client Elasticsearch	1	4721	January 9, 2018
Elasticsearch Search Query Load Testing using Jmeter Elasticsearch elastic-stack-monitoring	0	157	June 10, 2024
Help figureing out the thread pool distribution Elasticsearch	7	593	January 12, 2017
Deeper into threadpool Elasticsearch	4	390	December 4, 2018

Connection pool

Related topics