ElasticSearch Performance tuning with 3 nodes

Roshan_Jha · May 26, 2020, 3:11am

Hi

below is the configuration of my elasticstack.

total of 3 nodes with all the nodes eligible for master as well as master node.

each node system configuration

RAM :- 8GB on each node
CORE:- 4 core

i have provided 4 GB to heap size to each of node.

above are the configuration of my elasticstack.

Now with the problem.

I am doing load testing on elasticquery with 1000 concurrent user on 1 second.
below is the query which i am using .

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "userLogin": {
              "query": "XXXX",
              "slop": 0,
              "zero_terms_query": "NONE",
              "boost": 1
            }
          }
        },
        {
          "match_phrase": {
            "targetSystemId": {
              "query": "3000",
              "slop": 0,
              "zero_terms_query": "NONE",
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

Above query will always return single unique result

Now with the result,

thoughput is 252.44 per second with error of 1.15% connection timeout

As i think even if we use elasticsearch default configuration throughput of 252.44 is quite very low, i need to make it to atleast 1000 per second.
please suggest how the above can be done.

Brooke384 · May 26, 2020, 3:37am

See here: https://medium.com/kariyertech/elasticsearch-cluster-sizing-and-performance-tuning-42c7dd54de3c

Roshan_Jha · May 26, 2020, 4:59am

Thanks
I will look into this,
What I am thinking to troubleshoot this issue start with single index with default configuration and do the load testing.
Then increase the shard, index and nodes as per the requirement
Is it right approach to troubleshoot the issue ?

Christian_Dahlqvist · May 26, 2020, 5:07am

I would recommend storing the two pieces of information you are filtering on in separate keyword mapped fields and then use term queries instead if match phrase.

In order to give additional suggestions it would be good to know how many shards your data is distributed across and how much space this takes up on disk.

Roshan_Jha · May 26, 2020, 7:29am

Thanks for the reply,

below is the alternate query i already tried but there is not much difference in performance.

{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "userLogin": "XXXXXXXX"
              }
            },
            {
              "term":{
                "targetSystemId": "3000"
              }
            }
          ]
        }
      }
    }
  }
}

i have total of 3 nodes with 7 index with default configuration of 1 primary shard and 1 replica shard.

below is the size taken by each index on primary as well as replica shard

first entity *********** 47kb ************first and third node each
second entity ******** 5mb ************first and second node each
third entity **********130mb ********** first and third node each
fourth entity ********* 1mb ************first and second node each
fifth entity ***********3.3mb ***********first and second nod each
sixth entity **********36.2kb ***********third and first node each
seventh entity********1mb *************second and first node each

please let me know how i can improve the performance.

Thanks a lot

Christian_Dahlqvist · May 26, 2020, 9:45am

So you have 7 very small indices and a total of 14 shards? Why have you gone for having 7 indices instead of a single one?

As long as you do not have any mapping conflicts I would recommend that you reindex all your data into a single index and set the number of replicas so that all data nodes hold a copy of the data. Then send queries distributed across all data nodes with a local preference.

Roshan_Jha · May 26, 2020, 9:51am

all the 7 indices have different purpose on application level so basically i cannot merge those indices into one.
so please suggest if anything else can be done.
one more thing when doing load testing with apache jmeter of 7000 concurrent users i am getting below error.

	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156)
	at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$JMeterDefaultHttpClientConnectionOperator.connect(HTTPHC4Impl.java:326)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:850)
	at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:561)
	at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:67)
	at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1282)
	at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1271)
	at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:627)
	at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:551)
	at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:490)
	at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Connection timed out: connect
	at java.net.DualStackPlainSocketImpl.connect0(Native Method)
	at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
	at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
	at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
	at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
	at java.net.PlainSocketImpl.connect(Unknown Source)
	at java.net.SocksSocketImpl.connect(Unknown Source)
	at java.net.Socket.connect(Unknown Source)
	at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
	... 19 more

10.72.21.40 is data node currently this is first node, currently second node is master node

Christian_Dahlqvist · May 26, 2020, 10:04am

Start with a low concurrency level and gradually increase as long as query latency is acceptable. That will give you an idea of the level of concurrent queries your cluster can handle. If you can not consolidate your indices, which would make querying far more efficient, you may need more CPU cores to be able to handle more load in parallel.

Roshan_Jha · May 26, 2020, 10:10am

Thanks for the prompt reply,

Regarding concurrent queries cluster can handle is around 5500 concurrent users.
for the more CPU core its already 4 core and if utilization is not spiked is there is any need of more core ?.
currently CPU utilization is not more than 45% that means CPU is still not utilized on its full potential, right ?

Christian_Dahlqvist · May 26, 2020, 10:48am

If CPU is not the bottleneck, try to find out what is. Given your low data volume it should not be disk I/O but could perhaps be networking.

Roshan_Jha · May 26, 2020, 10:51am

thanks a lot,

i will look into network side,

but what about the error, its 30%, can it be due to networking ?

Christian_Dahlqvist · May 26, 2020, 11:00am

Are you sending requests to all nodes in parallel?

Roshan_Jha · May 26, 2020, 11:03am

I am sending all the request to single data node, asi know elastic distributes the load across the cluster right ?

Christian_Dahlqvist · May 26, 2020, 11:06am

You should distribute it across all data nodes.

Roshan_Jha · May 26, 2020, 11:31am

can you let me know, how to do the same.
it will be greatly helpful.

Thanks a lot

Christian_Dahlqvist · May 26, 2020, 11:32am

That is something you need to set up in JMeter.

Roshan_Jha · May 26, 2020, 11:34am

ok, thanks a lot i will do the same.

Roshan_Jha · May 26, 2020, 12:12pm

Hi Christian, In log i see something like this

{ml.machine_memory=8191995904, ml.max_open_jobs=20, xpack.installed=true}

is this a matter of concern and is it possible to disable ml since i am not using it.

system · June 23, 2020, 12:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation Search Query is slow Elasticsearch	3	469	January 15, 2019
Fine Tuned Cluster - Consultation Elasticsearch	2	596	July 23, 2017
Configuring a Cluster for High Throughput Elasticsearch	3	101	June 26, 2024
Elastic search Tuning Performance Elasticsearch	17	792	February 28, 2019
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	374	March 24, 2021

ElasticSearch Performance tuning with 3 nodes

Related Topics