Random sampling in elasticsearch to elasticsearch copy


I want to take a copy of some production Elasticsearch indexes and put a sample of them into a test cluster. I can use logstash to do the copy, and I know I can filter it in logstash using something like

filter { ruby { code => "event.cancel if rand <= 0.99" } }

However, I would like to be able to do the filtering in Elasticsearch so that logstash never sees the records it is going to drop. If they were small indexes I could use random_score and size to filter top-N, but my understanding is that that will not scale to asking for the top 10,000,000 documents. Is there another way?

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.