Read all documents in an index and write to a new index (with more shards)


(Kevin Burton) #1

Are there any tools that read all documents in an index, and write them to a new index?

This is useful if you have an index with say 50 shards, and you have 50 machines, but you want to expand this to 100 machines. This way you would just create a new index, read all documents from the old index, then write the new index, then swap the index names.

There are some problems with some naive approaches:

  • how do you make it work in parallel so it works on all nodes at once?

  • how do you make it resume so that if it crashes it can pick up where it left off.

I mean I think I can dive in and write this, but would be nice if something worked out of the box.


(Imran Siddique) #2

this is a good thread. I'm not a big fan of having 1 shard per machine. When closing on index/shard model, one should think of reasonable # of indices and shards per index so we have isolation and no single point of failure. And more importantly space for scaling in future.

Coming to this Q - this will be possible only when u have _source ON OR you are storing all fields, otherwise you wont be able to retrieve all docs.


(Mark Walkom) #3

There are a few tools out there and you can DIY.

I use Logstash - https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06


(Kevin Burton) #4

Yeah. Right now you need _source but we're doing that anyway. Not sure what % of people are doing this though.


(system) #5