Read all documents in an index and write to a new index (with more shards)

burtonator · August 14, 2015, 7:08pm

Are there any tools that read all documents in an index, and write them to a new index?

This is useful if you have an index with say 50 shards, and you have 50 machines, but you want to expand this to 100 machines. This way you would just create a new index, read all documents from the old index, then write the new index, then swap the index names.

There are some problems with some naive approaches:

how do you make it work in parallel so it works on all nodes at once?
how do you make it resume so that if it crashes it can pick up where it left off.

I mean I think I can dive in and write this, but would be nice if something worked out of the box.

mosiddi · August 15, 2015, 6:12am

this is a good thread. I'm not a big fan of having 1 shard per machine. When closing on index/shard model, one should think of reasonable # of indices and shards per index so we have isolation and no single point of failure. And more importantly space for scaling in future.

Coming to this Q - this will be possible only when u have _source ON OR you are storing all fields, otherwise you wont be able to retrieve all docs.

warkolm · August 15, 2015, 6:59am

There are a few tools out there and you can DIY.

I use Logstash - https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06

burtonator · September 9, 2015, 9:25pm

Yeah. Right now you need _source but we're doing that anyway. Not sure what % of people are doing this though.

Topic		Replies	Views
Replication basics Elasticsearch	13	554	January 17, 2012
Indexing multiple things at once. Possible? Elasticsearch	6	467	September 1, 2010
Shard count and plugin questions Elasticsearch	13	589	June 10, 2014
Scaling strategies without shard splitting Elasticsearch	3	688	October 17, 2014
Offline indexing and expected scaling performance Elasticsearch	3	1864	October 15, 2012

Read all documents in an index and write to a new index (with more shards)

Related topics