Hi everyone.
We're facing an issue whereby having replicas drastically decreases query performance. Our set up consists of 3 nodes running ES 7.16 with 4GB heap each.
We have around 1.6 million documents that contain a single field, a dense_vector
one with 768 dimensions. Around 23GB worth of data.
Our query is as follows, we're basically trying to do an exact nearest neighbour search with some similarity threshold:
{
"_source": false,
"min_score": 0.7,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "Math.max(cosineSimilarity(params.embeddings, \"bertHeadlineEmbeddings\"), 0)",
"params": {
"embeddings": [
// 768 floating point numbers here, omitted for clarity
]
}
}
}
},
"size": 10
}
When running a simple test consisting of 2 consecutive batches of 50 parallel queries like the above (with different embeddings
parameter), we're getting performance results that meet our acceptance criteria with the following two set ups:
- 1 index, 6 shards, 0 replicas
- 6 indices (data is fairly evenly distributed but not completely so), 1 shard, 0 replicas
The problem is that the moment we introduce replicas, the response time increases on average by 3x or 4x, and these don't meet our acceptance criteria. All of average, median and max response times increase by said amount. It happens with the following setups:
- 1 index, 6 shards, 1 replica
- 1 index, 3 shards, 1 replica
- 6 indices, 1 shard, 1 replica
We've tried:
- Disabling adaptive replica selection
- Using the diagnostic events to ensure an even distribution of requests (it does seem to be the case, at the end of the test each node gets 33 requests except for one that gets 34 to get to the total of 100 requests).
- Using
"profile": true
doesn't seem to help much other than to confirm that all the timings go up by a lot. - Using a custom
preference
in the query as suggested in this other thread where the same issue seems to be raised, it didn't help.
We're really confused by these results, and we're not sure what to try next (probably increase the heap in the nodes), but we do need replicas.
Any suggestions would be greatly appreciated, thanks in advance.