Max Window Size is Set to 10000 but the Terms aggregations is giving single filter value larger than max window size.

We have created an index and are querying the index to display the complete dataset. However, we have encountered performance issues, as we are dealing with 1 million records in response to user search queries. To address this, we are making changes to show only the top 10,000 records and are exploring the possibility of applying filters and sorting exclusively to these 10,000 records if necessary.

For example:

The current scenario is as follows: When we search for "doctor" in the title field, we retrieve 800,000 records. Due to the impact on our servers, we are heeding Elasticsearch's recommendation to limit the results to the top 10,000.

My questions are:

  1. Can we disregard other records if the count exceeds 10,000 for each different search?

  2. Can we retrieve filter options for the top 10,000 records and apply filters for the top 10k only, is this achievable?

  3. If sorting is required, can we limit it to the top 10,000 records only?

Hi,

Yes, you can limit your search results to the top 10,000 records. This is actually the default limit in Elasticsearch for a single query. If you need to retrieve more than 10,000 results, you would typically use the Scroll or Search After API, but in your case, it sounds like you want to avoid this due to performance concerns.

Here's an example of how you can limit your search results:

GET /your_index/_search
{
  "query": {
    "match": {
      "title": "doctor"
    }
  },
  "size": 10000
}

This will return the top 10,000 matches for "doctor" in the title field.

As for your questions about filtering and sorting, yes, you can apply filters and sorting to these 10,000 records. The filters and sorting will be applied at query time, so they will only affect the top 10,000 records that match your query.

Here's an example of how you can apply a filter and sort your results:

GET /your_index/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "doctor"
        }
      },
      "filter": {
        "term": {
          "some_field": "some_value"
        }
      }
    }
  },
  "sort": [
    { "another_field": "asc" }
  ],
  "size": 10000
}

This will return the top 10,000 matches for "doctor" in the title field, that also have "some_value" in "some_field", sorted by "another_field" in ascending order.

Welcome!

I'm curious. Why a end user would need to see 10000 documents instead of just looking at the first ones?

Thank you @yago82 and @dadoonet for your reply.

@yago82 we don't want to show all the 10k results at the same time we have the pagination feature.

Hope this way we can not use the size set to 10k? that is the reason we are using the max window size of 10K even though the default value is 10K it does not make any difference. But our main concern is

a. Ignore all the results above 10k
b. get the filters or apply sort only for the first 10k.

@yago82 and @dadoonet Please let me know if you haven't got my use case.

We are providing the info for all over the USA. If the user requires to see only VA info we have filters but for default we provide complete USA info. That is the reason we are seeing huge numbers of result sets.

And what a user is going to do with 20000 documents for example? Are they consuming those documents in another tool? Or display all the documents on the result page? Are they going to paginate over the resultset?

We have the pagination feature. We don't display more than 500 records per page. The default page size is 10.

So how a user navigates to page 9990? By chance?
Or does he click 9990 times on the next button?

I am sorry to say this, but yes we are using pagination to navigate.

With that, we are trying to implement to receive top 10k results.

  1. Get the Filters for top 10K
  2. Perform Sorting if required for only the top 10K

Can we achieve this?

I guess you can sort on the client side.

Thank you. How can I achieve the filters?

I guess as mentioned here: Max Window Size is Set to 10000 but the Terms aggregations is giving single filter value larger than max window size. - #2 by yago82

If it does not work for you, please provide a reproduction script from where we can iterate.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.