Is there a way to do scan with limit

ryankauk · February 16, 2018, 6:49am

Hey so basically I'm using elastic search to retrieve a lot of data fast in order to do mapped calculations on it on the server. I'm planning to prepare for millions to tens of millions having to be loaded at once.

I came across the scan function in python, so I do the scan on each shard as well as split them into separate processes.

However I still would like to put a threshold on this in case it ever reaches to hundreds of millions and would just like to get a data sample size.

Please let me know if this feature exists as I can't seem to find any documentation. So far elastic search is exactly what I need and am just missing this small feature.

dadoonet · February 16, 2018, 7:54am

When using the scroll API, you can just call clear scroll anytime you think you are done (ie you have enough data).

ryankauk · March 7, 2018, 7:52pm

Using the python scan helper I'm not sure how you can get the scroll Id from it though. As well I split the scan into each shard the index has for maximum parallelism.

system · April 4, 2018, 7:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ScanError: scroll only succeeded on X out of X shards (python) Elasticsearch	1	4399	October 3, 2018
Is there a way to do scan with limit? Elasticsearch	4	1364	July 6, 2017
Issues with scan and scroll as well as count API Elasticsearch	5	1878	July 5, 2017
How to pull large amount of data (all documents in the index) using elasticsearch python client? Elasticsearch language-clients	4	802	May 18, 2022
Need help with scan/scroll using elasticsearch-py client Elasticsearch	2	11289	April 11, 2017

Is there a way to do scan with limit

Related topics