Is there any way i can keep a field and its mapping in memory or cached so that lookup on the basis of that field is faster even though there may be more than 1000 clauses specified for that field in search query.
Have you profiled the query to ensure retrieval from disk is slowing it down? Having that many clauses could in itself result in a lot of processing and potentially be the bottleneck.
How many shards are you querying? How large are these? How many documents? What is the mapping for the field(s) you are querying?
Currently i am using terms query so the mapping is keyword. There are almost 90 shards which i am querying and the total hits are around 80 million. Is there any way i could improve the search speed and improve the processing.
What is the total size of the index? How large are the shards? How many documents does the index hold?
What is the size and specification of the cluster?
Does the cluster hold other data?
Is this the number of documents in the index that match the query? If so, how many are you retrieving?
Yes actually i am running the query over an alias. I am returning only 10 documents. Each shard is of max 40 GB. Still during a load testing getting search latencies of more than 20 seconds for querying on that field.
For my business need i need just one field through which i can query with 1000 clauses with better search numbers during load. Any ideas to do this?
I do not think you can control what is cached or kept in memory in that level of detail. The best way to ensure that data is served from memory is to ensure that your indices can fit within the operating system page cache, which may be difficult if you have 90 shards of an average size of 40GB.
You say you are querying a single field based on a list of terms to a terms query. Is it always the same field you use? Does this field contain a single string or an array of strings mapped as keywords? Do you have clauses involving other fields as well?
If you have large documents where only a few fields are queried, it may be possible to create a new index where you store a minimal version of the documents with only the information used for querying. This could have a smaller number of primary shards and would hopefully be a lot more compact. You may then add a couple of dedicated nodes that just hold this index, thus ensuring that it always fits within the operating system page cache. You then query this smaller index with the complex query and then get the documents from the main index by document ID based on the results of the first query.
Even if this is not feasible it could be an interesting test as it would show how fast the query could get with all data in memory.