Drastic reduction in query performance when using replicas

Hi everyone.

We're facing an issue whereby having replicas drastically decreases query performance. Our set up consists of 3 nodes running ES 7.16 with 4GB heap each.

We have around 1.6 million documents that contain a single field, a dense_vector one with 768 dimensions. Around 23GB worth of data.

Our query is as follows, we're basically trying to do an exact nearest neighbour search with some similarity threshold:

  "_source": false,
  "min_score": 0.7,
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      "script": {
        "source": "Math.max(cosineSimilarity(params.embeddings, \"bertHeadlineEmbeddings\"), 0)",
        "params": {
          "embeddings": [
           // 768 floating point numbers here, omitted for clarity
  "size": 10

When running a simple test consisting of 2 consecutive batches of 50 parallel queries like the above (with different embeddings parameter), we're getting performance results that meet our acceptance criteria with the following two set ups:

  • 1 index, 6 shards, 0 replicas
  • 6 indices (data is fairly evenly distributed but not completely so), 1 shard, 0 replicas

The problem is that the moment we introduce replicas, the response time increases on average by 3x or 4x, and these don't meet our acceptance criteria. All of average, median and max response times increase by said amount. It happens with the following setups:

  • 1 index, 6 shards, 1 replica
  • 1 index, 3 shards, 1 replica
  • 6 indices, 1 shard, 1 replica

We've tried:

  • Disabling adaptive replica selection
  • Using the diagnostic events to ensure an even distribution of requests (it does seem to be the case, at the end of the test each node gets 33 requests except for one that gets 34 to get to the total of 100 requests).
  • Using "profile": true doesn't seem to help much other than to confirm that all the timings go up by a lot.
  • Using a custom preference in the query as suggested in this other thread where the same issue seems to be raised, it didn't help.

We're really confused by these results, and we're not sure what to try next (probably increase the heap in the nodes), but we do need replicas.

Any suggestions would be greatly appreciated, thanks in advance.

What are the 3 nodes? Master, Data, and Ingest? I would add another data node (at least)

Edit: Welcome to the community! :smiley:

1 Like

When you add a repluca the total data volume in the cluster doubles. Is it possible that this reduces the operating system page cache hit rate and that disk performance therefore is causing the slowdown?

How much RAM does each node have? What type of storage are you using?

With one replica the total index size would be around 46GB (around 16GB per node) based on the numbers you provided. If you had 24GB RAM per node with a 4GB heap the full index should fit in the page cache. If you currently have less RAM than this it may be worthwhile increasing it to see if this helps.

1 Like

Thank you, the 3 nodes are master ones. We can try adding a fourth one.

Thank you, the boxes have 8GB of RAM, and disks should be SSDs.

And I forgot to mention, but there's also other indices in the cluster, just focusing on the performance issues in this particular one.

We'll try increasing RAM and VM heap.

For optimal performance I would recommend ensuring that your indices fit in the page cache. If you have other data in the cluster as well that will naturally be tricky. Increasing RAM and only querying these indices would however show whether this is contributing or not.

1 Like

Thanks a lot, it seems that doubling up the heap did the trick and now I'm getting comparable results with replicas.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.