I'm trying to re-index data from one ES cluster to another with { "match_all": {} } query, but I'd like to limit the result set by a certain number. Currently, all "total" hits are being fetched from the source ES cluster.
Is there a way to stop the input when reaching a certain number of documents?
The "size" parameter of the ES input plugin is applicable for the scrolling only.
For what it is worth _reindex in Elasticsearch 2.3+ supports limiting the number of documents copied. I added that because it makes it very convenient to play with smaller data sets.
@warkolm, the use case - to split one big index into chunks, in order to analyse them separately. @nik9000, thank you for the hint! You are the genius!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.