Reindex GC overhead

Hi,

We are trying to do a reindex of one of our live indexes which contains 9.6 million docs and is 48 gig in size.
Elasticsearch is running on 1 machine with 36 CPU cores and 60gig ram. I gave Elasticsearch 30gig as heap size (in the jvm.options file).
We first tried to do this on a backup of this machine which worked perfectly; it took around 30 minutes to reindex all data.

Now we did the same on our production machine and it started out great; it did about 7 million documents on just under 30 minutes but then it slowed down so much it took 10 minutes to do 1 batch, finally resulting in elasticSearch not returning results anymore and reindex just seem to be stopped. The task was still active though.
After cancelling the task I can see the index has build 8 million docs so it was pretty close, unfortunately it is not usable like this.

Some more info:

  • We are running ES 5.3.0
  • New index has refresh interval set to -1 to speed up indexing.
  • Reindex happens in batches of 5000.
  • Machine is using about 450% CPU (100% = 1 core) and about 9g ram during the reindex.
  • I'm seeing alot of these errors:

"[2018-03-06T08:34:58,699][WARN ][o.e.m.j.JvmGcMonitorService] [BcXdUDQ] [gc][2744] overhead, spent [1.1s] collecting in the last [1.7s]"

I suppose this is the issue? Any way I can prevent these and get my reindex through? The machine is heavy enough to easily do this I think.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.