cross post from use_real_memory setting causes GC collection issues · Issue #62278 · elastic/elasticsearch · GitHub per request.
Elasticsearch version (bin/elasticsearch --version): 7.10.0
JVM version (java -version): openjdk version "11.0.5" 2019-10-15 LTS
OS version (uname -a if on a Unix-like system): rhel 7
We're trying to upgrade from 5 to 7 (quite late, yes yes). Our old cluster was 1 node, and our new one is 3. So there really is no reason why this new setup can't handle the old's data. I set up our ingestion pipeline to write to the new cluster, without copying any old data, and after a week or two it hit this issue. I cleared the cluster, and I tried copying over the smaller indices via python's bulk helper, and I could consistently hit this issue. It looks like only one node is hitting the limit, but I don't see the old heap GC run basically ever.
My manual copy test was roughly 5 GB of data before I ran into the issue consistently (I did not manage to copy everything). There are 30 indices, usually set up with 5 primary 1 replica (should probably make that 3 primary or less for this test cluster, but I don't think it'd break things this bad). Each node has 80 to 90 segments. In theory, there should be tons of runway here, there's no reason for a node to be using this much heap.
Our circuit breaker was 50% when I first saw the issue (hold over from our 5 config). I raised it to 75%, then did that manual copy test, and I still see it consistently.
I was not writing to any node in particular, afaik, we've set up an alias that should do round robin over the nodes.
Notice we are using CMS (default
jvm.options + java 11 = CMS). This thread has a similar issue, but everyone fixed it by moving to CMS.
I then set
use_real_memory to false and turned the circuit breaker back to 50%. It was able to handle this, and old GC does run, but the heap usage is still very high for this 9.2 gb of data I ended up testing with (we have a index that is a few hundred gb I did not test with since copying that would take forever).
I didn't pull stats directly here, but this is the graph from when we set up the cluster to when we noticed the issue. This is probably just handful of GB (less than 10) over these weeks; we didn't backfill our data, this is just changes to our data.
Heap usage eventually gets to 50%. Bulk API stats throwing circuit breaking exceptions.
While testing with
Heap usage was at this level for days after the test. Old GC did not run.
Node was stuck at 50% circuit breaker. Restarted with 75%, then ran my test and hit the breaker with bulk API. 2 nodes go back down, but one node stays hot. Even querying
_nodes/stats/jvm can cause the circuit breaker exception.
After setting to false:
Heap usage is still very high for this amount of data, afaik, but I can actually see it GC in some graphs. Circuit breaker was never triggered, despite the heap going over the limit of 50%. I assume that the stats here are using a different calculation than the circuit breaker is.
Note that after my experiment, memory usage is pretty constant. Not sure if this is expected.
You can see I restarted the cluster, heap went to basically 0, then I started the test, after which heap goes up and stays up.
I also tried enabling more data going to our cluster. Our job ran here, and ingested 32 GB of data. The circuit breakers weren't hit and looks like GC is running. So maybe the high-heap-usage-with-no-traffic is expected? It also looks like our jobs didn't have errors with the bulk API here, so we didn't trip circuit breakers (50% with use_real_memory false still).
Let me know if you need more information, I'd be happy to provide it. I'm also not sure what is expected here, maybe heap usage is meant to be quite high with ES and GC only run infrequently. It at least seems like with 50% circuit breaker and use_real_memory is false, the cluster is usable, even if heap does seem high.