Replicas reduce performance complex queries drastically?

We are seeing some very strange query performance on most of our indices when replicas are enabled. Since we require high availability and all our data nodes are directly on local SSD drives we use 2 replicas for all indices.

When firing very complex queries (5K+ lines pretty JSON) against a medium sized index we are seeing 20-30 second response times and timeouts. When disabling all replicas, the response time drops to 3.5 seconds consistently.

Our setup:

  • Elasticsearch 6.2.4
  • 8 data nodes on local SSD, 18GB heap, 10 CPU each. Across 4 physical hypervisors
  • Index mapping with 300 fields, 20 nested.
  • 5M docs in index. ~100GB in primary shards.
  • 16 shards
  • Query has many dis_max parts, aggregations, sorting and highlighting. No filters.

Why does having replicas have such a big impact on the query performance? Whether we have 161 shards or 163 shards should not impact the fact that we should query 16 unique shards across all nodes (avg 2 per node).

The cluster is pretty much idle before and during the query. No noticeable load, CPU or disk usage on any node. All nodes have the same hardware and performance.

What I've tried already:

  • Reduce the number of shards to 8 or 4: This actually increases the response time, without replicas enabled. With replicas I am seeing the same extreme query times (20-30+ seconds + timeouts).
  • Tweaking the query. Removing aggregations, sorting, etc. parts. Did seem to affect overall response times, but the major difference between 0 and 2 replicas remains
  • Profile the query for different shard / replica setups. No clear part stood out as the reason for the difference. But too much data to go over manually. Any good tools to analyze the Profile response?

The difference between requests round-robining across replicas vs always hitting the same nodes with attendant caching benefits?

I dont use Adaptive Replica Selection, so all routing is done in a uniform-random round-robin style.

There should be no difference in chance for each node to service more or less shards for the same request, right?

  • With 8 nodes and 16*1 shards (0 replicas), each node handles the query for 2 shards on average
  • With 8 nodes and 16*3 shards (2 replicas), each node handles the query for 2 shards on average

Or is my statistics math way off here?

When a shard is held on only one node (ie one primary, no replicas) any caching (eg file-system level or segment level) will make returning queries faster.
When you have replicas your queries will be randomized across replicas, some of which will have empty caches for your query and others warmed.

You can use a client session ID as the custom value in the preference setting to route queries back to the same replica where possible to gain the benefits of a warm cache

So in general, as your replicas go up, you should also increase memory to account for the fact that the same shard data is cached in multiple nodes?

Are there any other ways to increase query performance for these very big queries? Even when providing so much memory to be able to read the entire shards into memory, I am still seeing seconds of query time.

Replicas will help with serving concurrent users' search loads but of course only if you give them the compute resource they demand (spindles, cores and yes, memory).
Session-based routing will ensure you're making the best use of that memory though because it returns a user to the same choice of replica and increases the possibility of cache-hits (user sessions typically repeat queries when hitting "next page" etc)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.