GC Failures


#1

Hi,
we have a cluster v6.3 (java 1.8) and from time to time we see a problem with the gc on the data nodes.
The rate drops to zero and the jvm heap usage goes up to a 100% (which cause the cluster to stop serving).
The data nodes have are 30gb ram and half is allocated to the heap.

Node configuration:
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
indices.queries.cache.size: 30%
indices.requests.cache.size: 5%
indices.queries.cache.count: 5000000

GC logs:
2018-11-21T06:20:30.382+0000: 74173.669: [CMS-concurrent-mark-start]
2018-11-21T06:20:32.516+0000: 74175.803: [CMS-concurrent-mark: 2.134/2.134 secs] [Times: user=9.62 sys=0.01, real=2.14 secs]
2018-11-21T06:20:32.516+0000: 74175.804: [CMS-concurrent-preclean-start]
2018-11-21T06:20:37.085+0000: 74180.372: [Full GC (Allocation Failure) 2018-11-21T06:20:37.085+0000: 74180.372: [CMS2018-11-21T06:20:37.360+0000: 74180.647: [CMS-concurrent-preclean: 4.842/4.844 secs] [Times: user=7.17 sys=0.05, real=4.84 secs]
(concurrent mode failure): 14326205K->14326207K(14326208K), 15.0557023 secs] 15323005K->14437097K(15323008K), [Metaspace: 84592K->84592K(1130496K)], 15.0559302 secs] [Times: user=15.07 sys=0.00, real=15.06 secs]
2018-11-21T06:20:52.141+0000: 74195.428: Total time for which application threads were stopped: 15.0567299 seconds, Stopping threads took: 0.0001486 seconds
2018-11-21T06:20:54.141+0000: 74197.428: [GC (CMS Initial Mark) [1 CMS-initial-mark: 14326207K(14326208K)] 14749195K(15323008K), 0.0340097 secs] [Times: user=0.34 sys=0.00, real=0.03 secs]
2018-11-21T06:20:54.175+0000: 74197.463: Total time for which application threads were stopped: 0.0348284 seconds, Stopping threads took: 0.0001442 seconds

Jvm usage \ Gc rate graphs:



(Byron Voorbach) #2

Do you see anything weird in the standard ES logs? Both for master and data nodes?


(Junaid) #3

First thing is that we need to understand the high heap usages seen for the above case. GC failures are expected if heap usage approaches 100% mark. For more details, I would recommend you to go through this article.

There are a few questions to understand your problem.

  1. What is the size of the cluster and its topology (number of master & data nodes)?
  2. How many shards the cluster is hosting and what is the distribution per node?
  3. What is the mapping size?
  4. What is the output of _nodes/stats at the time of high heap usage?

(Itay Bittan) #4

Thanks guys, I'll answer for Amit (we are on the same team).
@byronvoorbach we see a lot of query in slowlog in the effected nodes.
@mjunaidmuzammil:

  1. 24 nodes: 3 master nodes, 12 data nodes (serving only, indices are read-only), 4 data nodes (indexing only), 5 client nodes.
  2. 1,700 indices, 8,800 shards, 800M docs. 700 shards per node (on each one of the 12 data serving nodes).
  3. what do you mean by mapping size? where can i get this info?
  4. the graphs above was generated by the _nodes/stats output. nothing abnormal except that.

Thanks,
Itay


#5

Just adding that when it happens we also see process_cluster_event_timeout_exception in the master logs.


(Junaid) #6

@Itay_Bittan Slow GC will impact all other operations on the cluster. I see you have changed some of the cache settings (indices.memory.index_buffer_size is increased to 30%, indices.queries.cache.size to 30%, both defaults to 10%), indices.requests.cache.size is increased to 5%, default value is 1%.

Can you check what are the sizes of these individual caches? You can try fetching individual cache sizes using _nodes/stats/indices?pretty. It might be the case that you are over allocating the caches.

Also How big the cluster_state is? Cluster_state size is proportional to the mapping size.

What is an average size of shard in your ES cluster? Do you have bigger shards greater than 40G?


#7

Thanks for the answer!

  1. The cache size sums up to ~5gb on each data node.
  2. cluster state compressed size is about 2mb.
  3. The biggest shard we have is 15gb and the avg size is about 1gb.

(Junaid) #8

@Amit_Michelson @Itay_Bittan Looks like you are over allocating shards. You have 15G of heap memory available over each node, which means that there are (700/15) = 47 shards / 1G heap. It is recommended to have 20-25 shards per 1G heap which means for 15G, shard count should be 300-375 shards. More details can be found at this link. Also the more heap per node you are allocating for caching & other stuff, the less number of shards that node can accommodate.

For the current situation, you need to either reduce the number of shards or increase the number of nodes to accommodate the existing shard count.

Hope this helps you out.


(Christian Dahlqvist) #9

As you seem to have indices that are read-only, I would recommend force merging them down to a single segment if you are not already doing so. As discussed in this webinar, this has the potential to reduce your heap pressure.


(Itay Bittan) #10

Thanks guys.
To be more accurate:
total cluster size: 750GB
Every one of the serving data nodes hold ~710 shards.
Every one of the indexing data nodes hold ~85 shards.
Serving indices replication_factor = 6
Serving indices are being replaced once (or twice) a day.
Our indices sizes are different, it can be from few MB to 15 GB.
@Christian_Dahlqvist we are doing force merging before relocating index from index zone to serving zone. We have interesting issue there BTW: it seems like the size of the index increased after this operation (while we expected the opposite, since we have a lot of deleted docs).
we are doing it with only_expunge_deletes=False