so I'm migrating ES5 to ES7 by loading indices in ES6, then reindexing them, doing a snapshot, and then loading the snapshot in ES7 and doing another reindex. I have noticed that the index size varies dramatically.
By doing the Reindex from ES5 to ES6, the index sizes reduced in size.
The current index I'm testing has 2.4 GB after restoring in ES7, the same index also has 2.4 GB before doing the snapshot -> all good.
But if I reindex on ES7, the index size changes to 4.2 GB, both indices have the same settings (shards = 5, replica = 1, no additional settings applied).
Additionally the ES7 index has 7666735 docs (same as ES6) and additionally docs.deleted is 1653461. During reindex I noticed that the index size increased further, after all documents were added.
Can someone hint / explain why this happens?
Why does a reindex increase the size by 75% ?
Why are there suddenly docs.deleted?
It seems Reindex produces a lot of segments, which are then merged again after some time.
After waiting around an hour, the index stopped at 3.6 GB in size (was around 7 GB sometimes) but the deleted element count was still high.
Then I did a force_merge?max_segments_num=1 and it removed all the deleted items.
Now I have the same docs count, both indices have 0 docs.deleted, but the es6 index (with force merge) has 2.3 GB and the ES7 index (with force merge) has 2.6 GB - How can that be?
Every segment in each of the indices contains around 1 530 000 documents, in the ES6 index the documents are ~490 MB in size, in the ES7 index they are ~540 MB.
Segments are all version 8.2.0 now. Does that mean, the index was fully upgraded?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.