Reindexing to a single node cluster

Georgi_Ivanov · January 29, 2019, 1:57pm

Hello,
I am in the process of migrating our old version 2.4 cluster to a new ES 6.5.
The approach is the following:

Set up a single node ES with storage large enough to hold my old cluster data w/o replication.
Reindex the data from ES 2.4 to 6.5
Once this is done and the code is prepared, switch to the new one.

So far, i am 80+% percent of the reindexing.

The problem is that the single node cluster, can not hold my data anymore. I am running out of memory.
I am giving ES 64GB (-Xms64g) and it seems that this is not enough.
I have 936 shards(i don't think this is too much, but the shards are big) and my single node at the moment, and my heap is at 55GB already.
The disk footprint is around 6TB atm.

The cluster is completely idle, apart from the indexing.

I have the feeling i am stuck with this approach.
Is there anything i am doing wrong here ?

Would adding one(or 2) more servers improve the heap usage ? I am trying to avoid that, as this adds costs to the migration.

Any help would be appreciated.

warkolm · January 29, 2019, 1:59pm

That is too much, you should look to halve it.

Georgi_Ivanov · January 29, 2019, 2:28pm

So adding more servers would help?
As the shards are going to be reassigned to other nodes...

warkolm · January 29, 2019, 9:27pm

Yes, but you should also re-evaluate your shard/index strategy.

leandrojmp · January 30, 2019, 12:37am

This also seems too much, it is recommended to keep the heap under 32 GB, sometimes under 30 GB, as you can see in here on the documentation and on this blog post from elastic.

How big are your shards? If you have 6 TB of data with 936 shards this seems to give around 6.4 GB per shard, you can have bigger shards, something around 30, 35 GB would not be a problem.

Georgi_Ivanov · January 30, 2019, 8:31am

@leandrop
This is not a production configuration. The idea is once i reindex all the data to add more nodes to the cluster, and reduce the Xmx. I am aware of the Java compressed pointers.
The number of shards is a problem right now.

My production cluster has 34 nodes and 6000 shards including the replica shards.
My new cluster is planed to have bigger machines but less nodes. For example 6 nodes.

This will leave me with ~1000 shards per cluster which is too much.

Do the replica shards as a limit toward the shards per node limit ?

Christian_Dahlqvist · January 30, 2019, 8:35am

Yes, they do count as they take upon resources and are included in the cluster state. Read this blog post for some practical guidance on shards and sharding practices.

system · February 27, 2019, 8:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issues in migrating ES 1.4.4 to ES 6.4.2 Elasticsearch	1	296	March 18, 2019
Single node ES 2.4.5 "cluster" locks up on reindex reads Elasticsearch	9	623	October 27, 2017
Improve reindex speed into new cluster Elasticsearch	4	1090	January 5, 2019
How to reindex a single index into several time-indexed indexes Elasticsearch	13	812	July 17, 2019
Export/Transfer from multiple nodes to one single node Elasticsearch	4	483	November 2, 2018

Reindexing to a single node cluster

Related topics