Optimize the latency for 1k OR clauses

Hey, i want to query on a specific field with 1000 clauses separated by OR. But the latencies that i am getting are very high. Is there any setting in elastic to set cache policy for values of a field so that the values for a field are always stored in memory and queries on that field even with 1000 OR clauses are optimised.

Also is there any other special type of search query provided by elastic to achieve this?

A few things:

  1. You didn't mention which version of Elasticsearch you're currently using, but if you aren't using a newer release, there has been a good number of improvements to performance that you could be missing out on.
  2. Take a look at: Tune for search speed | Elasticsearch Guide [8.3] | Elastic for some general possible improvements.
  3. Take a look at the profile API and see what your query is spending most of its time on. (Kibana has a nice UI for this; Profile queries and aggregations | Kibana Guide [8.3] | Elastic, if you don't want to look at the raw JSON.
  4. Provide an example query of what you're current running today (doesn't need to be a full 1k clause query but should be an example that represents your query) and the use case. Without seeing what you're running and knowing why you're running it; others are limited on the feedback/input that they can provide.
1 Like

So i am currently using terms query on a specific field to do this. It contains almost 1000 words. However the latencies which i am getting are more than 5 seconds whereas for another basic search query i get search latencies of miliseconds.
GET /alias/_search
{
"query": {
"terms": {
"test_field": [ "random_1", "random_2"....."random_1000" ],
"boost": 1.0
}
}
}

Is there any way where we can keep the values for this test_field in memory which results in quicker retrieval and improvement in latencies.

Also suggest other ways to improve on the term queries.

Hmm, this query is fairly simple in nature, I don't know if there is much from an optimization perspective. I'd try to run this through the profile API to see what is taking the time here. The only thing that I think might save some compute time, is if you don't need scoring, you could probably wrap the terms query in a bool/filter to not have the documents scored.

Others here might have more ideas, but I think running through the profile API would be a good start in trying to find the issue.