Masters are ramping up GC times, using more heap every minute

I am running a cluster with 20 9TB data nodes (32GB heap), 3 client nodes (24GB heap), and 3 master nodes (20GB heap). This cluster has been running stable under monitoring for the past 90 days. Yesterday, nowhere near a UTC rollover event which would create new daily indices, we started seeing ramped up GC time on the masters. The GCs are more frequent and taking longer, here's a graph:

The two saw-tooths are from my manual restart of the master. The newly elected masters show the same memory growth.

Here is the time spent in GC:

What can I do to debug this issue? In the past it has been a problem with out-of-control schema growth from poorly processed data. But here I'm not sure, how can I track down how memory is being used in the master?

Turns out one of our analysts was logging bro data directly to the cluster and had turned it on right at the time we saw this issue start. Disabling their data flow got the master under control again.

We are still trying to figure out what is causing the bro IDS data to induce master GC pressure.

I've seen this kind of thing when there is a field explosion due to inadvisably structured documents being indexed, i.e. something that should be a field value is in the documents as a field name, causing a high rate of dynamic addition of fields to your mapping. A particular tell for this (in addition to looking at the mapping and seeing thousands of fields) is that heap usage is extraordinarily high even on the unelected masters.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.