So a sort of followup to the G1GC question.
We notice at top of the hour + 05min we do a snapshot to S3 and we are seeing GC pause, enough that our indexing is impacted (timeouts indexing). New Relic tells us this pause is mainly being caused by old collection and shoot up to 400ms. Our young collection GC is not great, with it constantly around 50-200ms and it spikes to 300ms.
We have a new ES cluster running 5.6.4 and is indexing data only (we are using a groovy script for upserting due to our usage of nested documents). We are seeing good performance and CPU is acceptable (no node over 80%) and indexing around 5k/s docs. We do have a good number of shards, 60 shards, and they are somewhat small (2GB) on 9 data only nodes. Any idea why the spike in GC and why would it cause our indexing to backup and timeout