Replicas reduce performance complex queries drastically?

cdekker · September 12, 2018, 11:40am

We are seeing some very strange query performance on most of our indices when replicas are enabled. Since we require high availability and all our data nodes are directly on local SSD drives we use 2 replicas for all indices.

When firing very complex queries (5K+ lines pretty JSON) against a medium sized index we are seeing 20-30 second response times and timeouts. When disabling all replicas, the response time drops to 3.5 seconds consistently.

Our setup:

Elasticsearch 6.2.4
8 data nodes on local SSD, 18GB heap, 10 CPU each. Across 4 physical hypervisors
Index mapping with 300 fields, 20 nested.
5M docs in index. ~100GB in primary shards.
16 shards
Query has many dis_max parts, aggregations, sorting and highlighting. No filters.

Why does having replicas have such a big impact on the query performance? Whether we have 161 shards or 163 shards should not impact the fact that we should query 16 unique shards across all nodes (avg 2 per node).

The cluster is pretty much idle before and during the query. No noticeable load, CPU or disk usage on any node. All nodes have the same hardware and performance.

What I've tried already:

Reduce the number of shards to 8 or 4: This actually increases the response time, without replicas enabled. With replicas I am seeing the same extreme query times (20-30+ seconds + timeouts).
Tweaking the query. Removing aggregations, sorting, etc. parts. Did seem to affect overall response times, but the major difference between 0 and 2 replicas remains
Profile the query for different shard / replica setups. No clear part stood out as the reason for the difference. But too much data to go over manually. Any good tools to analyze the Profile response?

Mark_Harwood · September 13, 2018, 9:15pm

The difference between requests round-robining across replicas vs always hitting the same nodes with attendant caching benefits?

cdekker · September 14, 2018, 8:59am

I dont use Adaptive Replica Selection, so all routing is done in a uniform-random round-robin style.

There should be no difference in chance for each node to service more or less shards for the same request, right?

With 8 nodes and 16*1 shards (0 replicas), each node handles the query for 2 shards on average
With 8 nodes and 16*3 shards (2 replicas), each node handles the query for 2 shards on average

Or is my statistics math way off here?

Mark_Harwood · September 14, 2018, 9:08am

When a shard is held on only one node (ie one primary, no replicas) any caching (eg file-system level or segment level) will make returning queries faster.
When you have replicas your queries will be randomized across replicas, some of which will have empty caches for your query and others warmed.

You can use a client session ID as the custom value in the preference setting to route queries back to the same replica where possible to gain the benefits of a warm cache

cdekker · September 18, 2018, 9:38am

So in general, as your replicas go up, you should also increase memory to account for the fact that the same shard data is cached in multiple nodes?

Are there any other ways to increase query performance for these very big queries? Even when providing so much memory to be able to read the entire shards into memory, I am still seeing seconds of query time.

Mark_Harwood · September 19, 2018, 10:18am

Replicas will help with serving concurrent users' search loads but of course only if you give them the compute resource they demand (spindles, cores and yes, memory).
Session-based routing will ensure you're making the best use of that memory though because it returns a user to the same choice of replica and increases the possibility of cache-hits (user sessions typically repeat queries when hitting "next page" etc)

system · October 17, 2018, 10:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Drastic reduction in query performance when using replicas Elasticsearch vector-search	7	315	December 26, 2023
Getting worse search performance with a replica shard Elasticsearch	9	2242	July 5, 2017
Search performance improovment by adding replicas shards Elasticsearch	4	1568	December 19, 2019
Elasticsearch replica shard distribution Elasticsearch	3	903	August 31, 2017
Does adding multiple shard replicas increase performance? Elasticsearch	5	2301	March 28, 2017

Replicas reduce performance complex queries drastically?

Related topics