Hi,
I want to fetch a fixed large number of documents randomly from
Elasticsearch to compute some statistics (100,000 out of 10 M documents).
The randomness has to be predictable so that I get the same documents with
every request.
My problem is that scan and scroll is fast but as I understand the order is
not predictable. On the other side I could use the 'random_score' function
with a fixed seed in my query. That would fix the order problem but deep
pagination is very slow. Has anyone done this before? Any ideas or pointers
how to do this with Elasticsearch?
Any help appreciated.
Cheers,
Sebastian
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e00e363a-5346-48bd-807c-4b221bed7c28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.