I have a cluster with ~2 TB of data with a lot of very small documents. I need to migrate from 1.x to 2.x. The problem I am having is that when I do the restore from snapshot it uses the original mappings for the data ignoring the templates.
I need the new mappings! If I try and do the migration using logstash it takes a massive amount of time - more than we can support. Is there another export/import method I can use?
Reindex-from-remote works but it'll be as slow as logstash I think.
I'd skip directly from 1.x to 5.x if possible. 5.6 will have rolling upgrades into 6.x. You'd still have to reindex before upgrading to 7.0 but it'll amount to less work, hopefully.
Is the data constant? If so you might be able to bring a second copy of the index online slowly, outside of any downtime window. You'd need duplicate hardware but sometimes that is a thing you can swing if you only plan on needing it for a few weeks.
If possible you could parallelize the reindex process. Sliced scrolling isn't supported back in 1.x but you could manually slice the data by limiting the reindex using a query. Like reindex a couple of days of data at a time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.