We are using logstash to reindex from an Elasticsearch 2.4 cluster to an Elasticseearch 5.5 one.
It takes 12h to copy all the documents using 10 pipelines on logstash. Pretty fine.
The problem is that our documents lie all in the same index and they can be updated or deleted. So, after 12h and knowing that scroll is a snapshot, how can we apply the changes in the documents that happened during this time?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.