Are there any tools that read all documents in an index, and write them to a new index?
This is useful if you have an index with say 50 shards, and you have 50 machines, but you want to expand this to 100 machines. This way you would just create a new index, read all documents from the old index, then write the new index, then swap the index names.
There are some problems with some naive approaches:
how do you make it work in parallel so it works on all nodes at once?
how do you make it resume so that if it crashes it can pick up where it left off.
I mean I think I can dive in and write this, but would be nice if something worked out of the box.
this is a good thread. I'm not a big fan of having 1 shard per machine. When closing on index/shard model, one should think of reasonable # of indices and shards per index so we have isolation and no single point of failure. And more importantly space for scaling in future.
Coming to this Q - this will be possible only when u have _source ON OR you are storing all fields, otherwise you wont be able to retrieve all docs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.