Search 1M data in elasticsearch using pagination

Tripti · June 27, 2017, 5:10pm

I have loaded around 1TB of data on elasticsearch DB.
For searching I tried following ways -

"from+size" - It has default value of index.max_result_window as 10000, but I wanted to search from 100000, hence I set index.max_result_window to 100000. Then searched from 100000 and size=10, but it causes heap size full.
Scroll API - We need to specify time window for keeping search context alive and in order to keep
the older segments alive more file handles are required. hence it again consumes the memory configured in the nodes of the cluster.
search_after - I tried sorting documents on basis of _uid, but it gives me following error -

{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[fielddata] Data too large, data for [_uid] would be [13960098635/13gb], which is larger than the limit of [12027297792/11.2gb]",
"bytes_wanted": 13960098635,
"bytes_limit": 12027297792
}
}
},

What can be done to resolve this error and also which is the most efficient way to search a large chunk of data (i.e.100000 or more) through pagination?

polyfractal · June 30, 2017, 2:54pm

Scroll API is the proper way for deep pagination. The problem with search_after is that it's stateless... it returns the results of the index as they exist at the time of each execution. Meaning that ongoing updates/deletes/new documents will appear in the next pagination request and potentially mess up the order, duplicate results, etc.

Scrolling is the tool for deep pagination. Keeping search contexts alive is not necessarily memory-hungry, it's simply telling ES which segments to prevent from merging. And presumably this deep pagination process is a "background job", not something that hundreds of users are accessing simultaneously.

It does come with some overhead, but nothing is free and that's the cost of scrolling

Tripti · July 2, 2017, 4:16pm

Thanks Zachary !

system · July 30, 2017, 4:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Result window is too large, from + size must be less than or equal to: [10000] but was [10050] Elasticsearch	9	25529	April 23, 2018
Result window is too large, from + size must be less than or equal to: [10000] but was [11001] Elasticsearch	5	15587	July 5, 2017
Achieving Pagination in Elasticsearch using Scroll Elasticsearch	3	1390	March 20, 2017
Search_after vs deep pagination Elasticsearch	5	13611	June 20, 2017
How to implement pagination with large dataset Elasticsearch language-clients , async-search	5	2176	April 21, 2022

Search 1M data in elasticsearch using pagination

Related topics