Is it possible to get all the documents from an index?
I tried it with python and requests but always get query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
I have no idea how the scroll api works and the documentation isn't helpful for me either.
Could someone please help me.
Where res is the result of your previous es search.
You can do the es.scroll as many times as you need, just remember to update the scrollId value each time you do a new request
Sorry if I wasn't very clear
Ok, so I will get the first 10 000 results. How do I get the rest?
Sorry for my stupid asking, but I am missing the forest through the trees right now.
In this example, you have your first 10 000 hits in res, and the next 10 000 in res2. If you want results from 20 000 to 30 000, you just get the new scroll id value from res 2!
on one of my indexes I get no data just {'timed_out': False, 'hits': {'total': 1843, 'max_score': 1.0, 'hits': []}, '_shards': {'successful': 5, 'total': 5, 'failed': 0}, 'terminated_early': False, '_scroll_id': 'DnF1ZXJ5VGhlbkZldGNoBQAAAAAAARfQFm5rMUVCeUxTVDJHUm5qZ2dBQkpJMncAAAAAAAExGBZyNFIxMV93QVRqT0wtTTNoZ1dUenN3AAAAAAABF88WbmsxRUJ5TFNUMkdSbmpnZ0FCSkkydwAAAAAAAPrrFnpFTW9aaHRPUzd1X0Y0UHRORTFpSFEAAAAAAAExFxZyNFIxMV93QVRqT0wtTTNoZ1dUenN3', 'took': 2}
Any idea on that?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.