We have created an index and are querying the index to display the complete dataset. However, we have encountered performance issues, as we are dealing with 1 million records in response to user search queries. To address this, we are making changes to show only the top 10,000 records and are exploring the possibility of applying filters and sorting exclusively to these 10,000 records if necessary.
For example:
The current scenario is as follows: When we search for "doctor" in the title field, we retrieve 800,000 records. Due to the impact on our servers, we are heeding Elasticsearch's recommendation to limit the results to the top 10,000.
My questions are:
Can we disregard other records if the count exceeds 10,000 for each different search?
Can we retrieve filter options for the top 10,000 records and apply filters for the top 10k only, is this achievable?
If sorting is required, can we limit it to the top 10,000 records only?
Yes, you can limit your search results to the top 10,000 records. This is actually the default limit in Elasticsearch for a single query. If you need to retrieve more than 10,000 results, you would typically use the Scroll or Search After API, but in your case, it sounds like you want to avoid this due to performance concerns.
Here's an example of how you can limit your search results:
This will return the top 10,000 matches for "doctor" in the title field.
As for your questions about filtering and sorting, yes, you can apply filters and sorting to these 10,000 records. The filters and sorting will be applied at query time, so they will only affect the top 10,000 records that match your query.
Here's an example of how you can apply a filter and sort your results:
This will return the top 10,000 matches for "doctor" in the title field, that also have "some_value" in "some_field", sorted by "another_field" in ascending order.
@yago82 we don't want to show all the 10k results at the same time we have the pagination feature.
Hope this way we can not use the size set to 10k? that is the reason we are using the max window size of 10K even though the default value is 10K it does not make any difference. But our main concern is
a. Ignore all the results above 10k
b. get the filters or apply sort only for the first 10k.
@yago82 and @dadoonet Please let me know if you haven't got my use case.
We are providing the info for all over the USA. If the user requires to see only VA info we have filters but for default we provide complete USA info. That is the reason we are seeing huge numbers of result sets.
And what a user is going to do with 20000 documents for example? Are they consuming those documents in another tool? Or display all the documents on the result page? Are they going to paginate over the resultset?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.