By default the offset + limit is limited to 10,000. This can be modified at a cluster level, but I would seriously advise against doing do.
When paginating in this manner, Elasticsearch has to parse the query, build the search context, distribute the query to applicable shards, coalate the results, skip past $offset items, then read out $limit items and destroy the search context for each page which means that the deeper we paginate, each page is more expensive than the page before it.
Oy.
The 10,000 limit is there for a reason.
Thankfully, Elasticsearch has a scroll API, which reuses search context and position from one request to the next. You should use it when you need to paginate deeply.
educate your users. You normally don't have to click 1000 times on "next page" to get the result you were looking for. Think about Google or Qwant. Do you often go more than page 1?
add a way to change the sort order. So last page comes in first page.
if your need is to extract all the data to do some other processing later, then as @yaauie said, scroll API is the way to go. It has a big advantage. When you scroll, whatever happens on your index (new documents added for example), you will get consistent results.
if you really need to do deep pagination, look at the search_after feature. It has been designed for that. Note that you can basically just do "next page" with that but not "go to page 1476".
Yeah.. I get it. But I wonder why this limit is not applied for "top hits aggregation". I am able to get more than 10000 documents using top hits aggregation.
Is it because aggregations are handling in a better way than direct querying or Elasticsearch missed to validate window size in aggregations ?
Please help me to understand why the window size limit is not considered in "top hits aggregation". I am able to fetch more than 10000 docs using this aggregation.
It is because aggregations are better in handling it or elastic search missed to validate window size in aggregations.
I answered too quickly sorry. There's already a limit in the top_hits aggregation that defaults to 100 called index.max_inner_result_window. Though this limit is per bucket so it is possible to return more than 10,000 documents if you return more than 100 buckets in a parent aggregation.
Oh ok..
I am using elasticsearch 5.5 and that's why I was able to fetch.
Thanks @jimczi@dadoonet for taking time to answer my question.
Have a nice day.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.