Huge documents - are these to blame for our Young GC problem?

Henrik_Ossipoff_Hans · July 24, 2017, 1:39pm

I will admit that I almost know the answer to this problem, but I'm somehow looking for confirmation, I guess.

We're seeing excessive Young GC "problems". Our setup is more or less the following:

3 data nodes (24 GB RAM, 12 GB heap)
3 master nodes (8 GB RAM, 4 GB heap)

I realize that a lot of young GC is normal, but we see jumps in 5% heap to 75% heap in maybe 30-60 seconds, resulting in GC runs of up to 10 seconds in some cases. 10 second GC runs every few minutes isn't very nice, when it happens across all 3 nodes - many timeouts.

I would say we have a very moderate influx of data, maybe 10-30 documents per second.

However, and I'll have to confess, we have some very large documents - with a lot of fields. So much that we've had to bump the default number of allowed fields, which leads me to think we're doing something very bad.

Is it normal to see that very large documents (in terms of fields) can cause this sort of behaviour?

if so, how do people deal with these issues? Split the data into multiple indices?

Mark_Harwood · July 24, 2017, 1:53pm

How many fields do you have now?

When adding documents with never-seen-before fields this requires a change to the index's mapping definition which in turn requires coordination with the master node to revise the schema which in turn then needs disseminating to all other nodes.
Clearly this adds more overhead to what would otherwise be a straight write of a document's contents on a data node. Declaring index fields up-front or avoiding the need to introduce new fields in your JSON will help matters.

Henrik_Ossipoff_Hans · July 24, 2017, 2:05pm

My guess is that our mapping consists of between 6.000 and 8.000 fields, with a single document being around 1.200 to 1.500 fields. I know this is most likely way too much.

I get your point here, but it does look like we're seeing the same GC pattern even after all of our documents have been indexed, without adding any new fields.

system · August 21, 2017, 2:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Masters are ramping up GC times, using more heap every minute Elasticsearch	3	460	June 2, 2018
Frequent heavy GCs and cluster unstability Elasticsearch	3	520	February 27, 2017
Young GC inconsistent durations Elasticsearch	1	820	January 5, 2017
Heap Usage is not as usual Elasticsearch	6	788	July 3, 2017
Investigate high GC time when indexing Elasticsearch	18	1047	September 25, 2023

Huge documents - are these to blame for our Young GC problem?

Related topics