Random sampling in elasticsearch to elasticsearch copy

I want to take a copy of some production Elasticsearch indexes and put a sample of them into a test cluster. I can use logstash to do the copy, and I know I can filter it in logstash using something like

filter { ruby { code => "event.cancel if rand <= 0.99" } }

However, I would like to be able to do the filtering in Elasticsearch so that logstash never sees the records it is going to drop. If they were small indexes I could use random_score and size to filter top-N, but my understanding is that that will not scale to asking for the top 10,000,000 documents. Is there another way?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.