Elasticsearch-hadoop and index.max_result_window

xzhou · November 28, 2018, 9:26pm

Hi All,

I think elasticsearch-hadoop uses REST API to retrieve data. does "index.max_result_window", which is 10k by default, take effect in batch job mode? my spark job needs to retrieve and analyze all data in this case. the index could have 1M docs.

Thanks,
xzhou

james.baiera · December 12, 2018, 8:00pm

index.max_result_window only applies for regular queries to Elasticsearch, and is there mostly as a safeguard against problems that can occur when doing deep pagination. Instead of paginating data through the regular search API, ES-Hadoop uses the Scroll API which creates a longer lived search context and exports the results out of Elasticsearch over the course of multiple requests. In this case, the documents are sorted by their natural internal document order which does not require any sorting.

xzhou · December 12, 2018, 8:49pm

Thanks!

system · January 9, 2019, 8:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Max_result_window Elasticsearch	6	1404	July 23, 2017
Can i change the value of index.max_result_window from 10000 to 100000 ? as it is not allowing to retrieve 10001 record through pagination Elasticsearch	4	3485	January 24, 2017
Max_result_window of all the results saved Elasticsearch	1	247	December 20, 2023
Elastic Serach 5.2.2, index.max_result_window : size Elasticsearch	3	2996	June 17, 2017
Aggregation of more than 10000 records Elasticsearch	5	11420	September 20, 2018

Elasticsearch-hadoop and index.max_result_window

Related topics