Reindex data of size 24GB

I have an index in ES 1.7 which has around 24 GB of data. I'd like to reindex that to ES 5.0.0. Can I do that in one go? Also the mapping of the 1.7 version is causing errors (empty string field names). How do I fix that?

A couple of things:

  1. 5.0 is pretty old at this point. If you are upgrading, I'd jump to the latest.
  2. Reindex doesn't attempt to copy the mapping from the source index, just the documents. You'll need to create the destination index before you start the reindex process.
  3. I usually recommend folks split big reindex tasks using some natural identifier in the data like a date field or a type field or something. Then use queries in the reindex to do the copying. That "naturally" allows you to see the progress as your chunks complete and it allows you to parallelize the chunks by running reindex in a couple of terminals or with more complex tooling if you are so inclined.
1 Like

Hey, thanks for the reply. I did not understand the 3rd point correctly. The empty string field names is the only issue with the 1.7 documents and can't I just fix that in the source (make it null) and index the data again to the 6.5 version? I don't understand how to split the reindex task using a natural identifier and the part after that.

Also I'm more inclined towards a 5.x version because it still supports mapping of multiple types.

That makes sense. Change as few things as possible. But understand that you'll have to do this again sooner.

Sure. You can change it in the _source as part of the reindex if you'd like or you can change it in the 1.7 index. Whichever works for you works for this.

Find some field like a timestamp. Make a list of queries that splits that data in your index on that timestamp but covers all the fields. Something like:

  • before 2016
  • January 2016
  • February 2016
  • March 2016
    ...
  • December 2018
  • After 2018

Reindex each one of those individually. That way you can measure the progress. You can know that you've finished up to a certain date. You can start the month over if it fails. You can run a few of them in parallel if you'd like.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.