Concurrent Search in elasticsearch

prashant · June 27, 2016, 9:34am

Hello,

I have been using ES for quite a few time now, but currently I am facing a huge performance issue while searching the ES DB. I have an Index with 5 shards, 5 nodes each of 32 Gb Ram and 16 cores CPU with SAN storage. My index size is around 1.8 Tb and total documents are approx 480 million. We have a requirement to search 30 threads at a time, inside our code we are firing 7 queries (We have 7 different POOLS) in thread per request, i.e. 30*7 =210 request at a time. We have observed that while executing single pools we are getting good performance but when we are firing all pools queries all in multithreading our queries becomes slow, we debugged and found that now all pools are taking time, which earlier was very fast when running sequentially. Please help us how to improve the performance of the queries.

Thanks,
Prashant Verma

dadoonet · June 27, 2016, 10:03am

I'd try to keep every shard under 20gb. So you should probably increase the number of shards.
This is something you should test obviously because it depends on your data and searches.

This rule of thumb would give you 90 primary shards. With 16 cores, 5 to 6 nodes should be fine.
This is not including replicas. If you are using default, it means that you have to allocate 180 shards. If so, I'd probably increase (double?) the number of nodes.

But you have to test all that!

Note that increasing the number of shards needs that you reindex your data.

Something important you did not mention:

elasticsearch version
java version

prashant · June 27, 2016, 10:13am

Thanks for your prompt reply...we are using ES version 1.7.4 with Java 1.8, we cant increase the number of shards now as it has already taken 2 months to load the entire data. We have extracted our oracle DB tables and inserted in ES index.

prashant · June 27, 2016, 10:20am

We have used following setting for each nodes...
ES_HEAP_SIZE=20g
ES_MIN_MEM=20g
ES_MAX_MEM=20g

sysctl -w vm.max_map_count=1048576
ulimit -l unlimited
ulimit -n 200000

prashant · June 27, 2016, 10:34am

We are not using any replicas...

Sarwar · June 27, 2016, 3:03pm

If you have disk space available, then you can create a new index with the new shard numbers. This will allow you to reindex from the old index to the new index. There should already be tools to help with this such as logstash or python es reindex api.

Sarwar

prashant · June 28, 2016, 5:24am

Hello Sarwar,

I need to understand why to create new index with more shards. Will that improve the concurrent search performance improvement? We have 5 nodes and we have created 5 shards, load average and cpu utilization is not a problem for us. Every nodes is getting 90% load average and 90% cpu utilization.

Thanks,
Prashant Verma

Topic		Replies	Views
Slow search response time (low CPU utilization) Elasticsearch	7	3433	July 31, 2019
Improving Elasticsearch performance on a single node by increasing shards Elasticsearch	4	6678	July 6, 2017
Elasticsearch not performing search parallely Elasticsearch	1	385	October 12, 2017
Will increasing the number of shards to utilize more CPU and improve performance? Elasticsearch	3	1564	June 5, 2017
How to increase indexing speed? Elasticsearch	5	5366	April 18, 2017

Concurrent Search in elasticsearch

Related topics