I will admit that I almost know the answer to this problem, but I'm somehow looking for confirmation, I guess.
We're seeing excessive Young GC "problems". Our setup is more or less the following:
3 data nodes (24 GB RAM, 12 GB heap)
3 master nodes (8 GB RAM, 4 GB heap)
I realize that a lot of young GC is normal, but we see jumps in 5% heap to 75% heap in maybe 30-60 seconds, resulting in GC runs of up to 10 seconds in some cases. 10 second GC runs every few minutes isn't very nice, when it happens across all 3 nodes - many timeouts.
I would say we have a very moderate influx of data, maybe 10-30 documents per second.
However, and I'll have to confess, we have some very large documents - with a lot of fields. So much that we've had to bump the default number of allowed fields, which leads me to think we're doing something very bad.
Is it normal to see that very large documents (in terms of fields) can cause this sort of behaviour?
if so, how do people deal with these issues? Split the data into multiple indices?