Elasticsearch mass data manipulation

Hi All,

We are going to move our elasticsearch server to a new host, so we will have to somehow copy the entire data from the old host to the new one. Elastic suggested to do that by creating a cluster and let the new host sync with the new one, which is probably the right that to go, except that in our case there is a little twist - We would like to use this opportunity to clean and clear the data, that is delete some garbage, fix bogus field names, etc.

So basically what we want to do is this:

  • Export the data from the old server to some text format ( JSON or, even better, CSV )
  • Run some scripts to fix the data in text from
  • Use logstash to parse the text form data and load it into elastic on the new server

So if anyone here have done something like this before and can share info about existing tools that may help or all kinds of tricks and tips, please kindly share. I promise to share in return our solution, or at least parts of it when we are done.

Many thanks,

Oren

Why not just do the entire thing in LS, save exporting to csv.

I wrote a recipe which uses logstash here: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

Note that you can use Reindex API. It might help.