Young GC inconsistent durations

Hi there.

Was wondering if anybody has run into similar behavior to the following:

Running some performance benchmarks against an isolated cluster. The cluster consists of 3 ES client nodes and 4 ES data nodes sized with: 64GB RAM (16GB for ES heap); 1TB SSD; 8 cores.

Data has been loaded to replicate our production environment, disks about 50% filled, replicating the shard/core ratio (1.5 shards to cores). We are using ES 2.3.3 with doc values.

The behavior we are seeing is that for a single user querying the cluster in 5 seconds intervals, produces more than acceptable response time, however the 98th% percentile there will be 10x spikes in terms of response times. For instance 100ms response time to 1.5s. We have been able to correlate these spikes to GC via the logs.

[2016-12-08 15:26:26,655][WARN ][monitor.jvm ] [machine] [gc][young][743843][6109] duration [3s], collections [1]/[3.1s], total [3s]/[1.2h], memory [7.5gb]->[7gb]/[14.6gb], all_pools {[young] [532.5mb]->[167.6kb]/[532.5mb]}{[survivor] [10.1mb]->[8.6mb]/[66.5mb]}{[old] [7gb]->[7gb]/[14gb]}

It appears that young GC is taking a bit longer than expected? Reading a bit more, could this potentially be due to the fact that searching the references from the old to new space takes a long time?

Was wondering if there are any suggestions to avoid these inconsistent young GC spikes as they spike our search latency. Could we be giving too much heap? Was potentially reading that maybe we are GC'ing to often and objects are being moved to the old space too quickly. Could this potentially lead to longer GC times while the old to young references are updated?

Just curious if anyone has seen similar.

Thanks for the input.

Couple additional notes:

  • Default java settings
  • Basic term and match searches over a range.
  • mlockall enabled
  • swap off
  • read heavy workload

Ryan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.