Reindexing to a single node cluster

I am in the process of migrating our old version 2.4 cluster to a new ES 6.5.
The approach is the following:

  1. Set up a single node ES with storage large enough to hold my old cluster data w/o replication.
  2. Reindex the data from ES 2.4 to 6.5
  3. Once this is done and the code is prepared, switch to the new one.

So far, i am 80+% percent of the reindexing.

The problem is that the single node cluster, can not hold my data anymore. I am running out of memory.
I am giving ES 64GB (-Xms64g) and it seems that this is not enough.
I have 936 shards(i don't think this is too much, but the shards are big) and my single node at the moment, and my heap is at 55GB already.
The disk footprint is around 6TB atm.

The cluster is completely idle, apart from the indexing.

I have the feeling i am stuck with this approach.
Is there anything i am doing wrong here ?

Would adding one(or 2) more servers improve the heap usage ? I am trying to avoid that, as this adds costs to the migration.

Any help would be appreciated.

That is too much, you should look to halve it.

1 Like

So adding more servers would help?
As the shards are going to be reassigned to other nodes...

Yes, but you should also re-evaluate your shard/index strategy.

This also seems too much, it is recommended to keep the heap under 32 GB, sometimes under 30 GB, as you can see in here on the documentation and on this blog post from elastic.

How big are your shards? If you have 6 TB of data with 936 shards this seems to give around 6.4 GB per shard, you can have bigger shards, something around 30, 35 GB would not be a problem.

This is not a production configuration. The idea is once i reindex all the data to add more nodes to the cluster, and reduce the Xmx. I am aware of the Java compressed pointers.
The number of shards is a problem right now.

My production cluster has 34 nodes and 6000 shards including the replica shards.
My new cluster is planed to have bigger machines but less nodes. For example 6 nodes.

This will leave me with ~1000 shards per cluster which is too much.

Do the replica shards as a limit toward the shards per node limit ?

Yes, they do count as they take upon resources and are included in the cluster state. Read this blog post for some practical guidance on shards and sharding practices.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.