Improve reindex speed into new cluster

I'm reindexing 9.2TB-data index (~2bn documents) from a v2-created index (restored onto v5.6 cluster) into a v5 created 20-shard index (on same cluster). The elasticsearch cluster consists of 9-nodes, each with 32 GiB RAM, 8 cores and a 4TB SSD.

It's taken about 12 days so far and seems to have slowed right down. My netdata dashboard shows that CPU is not being taxed at all, disk utilisation is up (as expected) and RAM is in high use.

Reindex batch size was 10,000, and everything runs through a groovy script that converts the v5-incompatible document IDs to a SHA256 hashes of themselves. Reindexing rate on the index is being reported as ~300-400/sec, and query rate from the source index is <50/sec. index.refresh_interval is set to -1 (although I only did that today after some more rooting around).

I'm a little bit worried. I've got another reindex process to run on a 5-shard index that has ~3.2bn documents in it, although it's only 2.3TB data.
This is all taking much longer than expected.

My question is if I roll another node into the cluster, will it adversely affect the reindex process? I'm assuming that as soon as another node becomes available, ES will start balancing the shards. If that conflicts with the reindex process I'll be extremely distraught!

Please advise, if possible (!).

I moved this one to it's own topic. Your request is about reindex speed, but your setup is different so it makes sense to ask questions specifically about it :slight_smile:

1 Like

Adding a new node should be fine for reindex.

In general I recommend folks break up big reindex tasks into many smaller ones and manage them manually or with a simple bash script. Smaller tasks can be stopped and restarted and you get a real progress report as the small ones finish. If you have a date field in your source index it is usually fairly easy to reindex a day's worth of docs at a time. Or an hour. Or a month. It depends on the number of documents you have and how small you'd like your batches to be.

1 Like

For information:

I changed the index settings so that there were 0 replicas. This made kibana show that the indexing rate was negative for a while, but brought the index memory and number of segments right down.

I also rolled in another node so that the shards were balanced evenly across the cluster (20 shard index across a 10 node cluster).

The indexing rate is back up to between 4,000 and 5,000/sec and it did 100 million in 12 hours, which is much better.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.