Elasticsearch upgrade and migration from 2.3.4 to 6.2

Hi,
We are working on upgrading our elasticsearch 2.3.4 environment to 6.2. In our system we have 6 node elasticsearch (2.3.4) cluster with full of logs in different monthly indices and high flow of incoming logs.
As direct migration from 2.3.4 to 6.2 is not possible so as intermediate we are using one elasticsearch 5.6 node.
During our test we are taking snaps of monthly indices from elasticsearch 2.3.4 one by one then from elasticsearch 5.6 snaps are reloaded in 5.6 node, after reload reindexing also done. Then again snaps are taken on elasticsearch 5.6 and reloaded on elasticsearch 6.2. This upgrade procedure is working but taking huge time, number of documents inside indices are like 100 million. We observes that most of the time is taken by reindixing process on elasticsearch 5.6 node.
We need help fro two points,
(1) Can anyone please help us for optimization of reindixing timing on elasticsearch 5.6.

(2) We are also observing that size of the migrated indices are larger than original ones. Statistics with few small indices are given below. Is this phenomena normal or things can be improved?

| Index name |Source |Target |Source Size |Documents|Target size (GB) |
|index-2018.09.09|Elastic2|Elastic6|8.4 Gigabyte |27006984 |19.7 Gigabyte |
|index-2018.09.06|Elastic2|Elastic6|1.4 Gigabyte |3337538 |3.4 Gigabyte |
|index-2018.09.07|Elastic2|Elastic6|1.8 Gigabyte |4250861 |4.0 Gigabyte |
|index-2018.09.08|Elastic2|Elastic6|3.8 Gigabyte |11670007 |9.0 Gigabyte |
|index-2018.09.10|Elastic2|Elastic6|3.6 Gigabyte |10848770 |8.5 Gigabyte |

Regards,
Angshuman

Hi Angshuman,

(1) Can anyone please help us for optimization of reindixing timing on elasticsearch 5.6.

It'd be more efficient to reindex directly from your 2.3.4 cluster to your new 6.2.x cluster. You can the reindex from remote feature to accomplish this.

Since you're creating entirely new indices on 6.x, there's no concern about Lucene level compatibility.

(2) We are also observing that size of the migrated indices are larger than original ones. Statistics with few small indices are given below. Is this phenomena normal or things can be improved?

In 5.0, we introduced a mapping change where we create a .keyword multi-field for text fields. If you're using dynamic mapping, this could certainly account for some of the additional footprint. You'll want to use an index template to pre-define your mappings before reindexing to avoid creating fields you don't want/need.

I'd also recommend following the tips in Tune for indexing speed to make the reindexing process faster.

Hope this helps!

Thank you, Robbie.
Let me apply your suggestion, will update you with the results.

Regards,
Angshuman

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.