Is there a way to do scan with limit

Hey so basically I'm using elastic search to retrieve a lot of data fast in order to do mapped calculations on it on the server. I'm planning to prepare for millions to tens of millions having to be loaded at once.

I came across the scan function in python, so I do the scan on each shard as well as split them into separate processes.

However I still would like to put a threshold on this in case it ever reaches to hundreds of millions and would just like to get a data sample size.

Please let me know if this feature exists as I can't seem to find any documentation. So far elastic search is exactly what I need and am just missing this small feature.

When using the scroll API, you can just call clear scroll anytime you think you are done (ie you have enough data).

Using the python scan helper I'm not sure how you can get the scroll Id from it though. As well I split the scan into each shard the index has for maximum parallelism.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.