Hey gang.
We've implemented an internal tool to read from an existing ES index (or more than one index) and then build a new / ES index.
The main use cases are:
-
taking daily / immutable indexes and building a new index to reduce total number of indexes to reduce memory / nr shards.
-
re-sharding an index to increase the number of shards when you add more hardware. So say you have 20 shards.. and then you grow to 20 boxes. Well you have to re-index everything and change the number of shards in a new index. This will do that.
It basically implements a new daemon that runs alongside your ES daemon. Then it uses the shard routing information to implement a scan on the same box as the primary for each shard. This way you get data locality.
There's a 'controller' app that you run to tell it which indexes to read from and which ones to write to.
If there's interest we'll OSS this... It might take some time though because we have to refactor our code to make it easier to OSS. Long story.. issues with git submoudles.
If Elastic wants to license this to include it as part of ES we would be fine with that and would try to get this done sooner.
... and now $100 says that there's already an awesome tool that does this that we didn't see.
I know there's one plugin that can do re-indexing but it didn't seem to be maintained or too far along.