Shards getting marked as stale frequently causing cluster to go yellow

Faiz_Ahmed_Mushtak_H · May 11, 2020, 7:23pm

I meant its array of strings (with duplicates). Eventually a string will be there in memory when the JSON is parsed right (for the ctx._source), be it the document or the request that is sent. I was talking about that

And yes, even the JSON object could grow too

Faiz_Ahmed_Mushtak_H · May 13, 2020, 7:12pm

Reducing the JSON size seems to have done the trick. We'll observe for a few more days. But before that, can you tell me what's the impact of increasing g1heapregionsize? What should I expect if I adjust it from 8mb to 16mb? I'll go thru our GC logs and see how many humongous regions are being created and compare it with previous incidents to verify if we should even tweak region size

Faiz_Ahmed_Mushtak_H · May 16, 2020, 11:01am

Almost a week without circuit breaker tripping. Looks like it has helped!

Just a minor concern, do you happen to know how g1gc behaves when snapshots are taken? We see a spike in our young GC time (goes upto 500ms-1s). We also see number of young GC to increase. So maybe the spike in time is a consequence of frequency of GC

Elasticsearch however wasn't affected in any noticeable way. The way young gen time is provided in the node stats (via JVM API's most likely), does it take into consideration time taken for concurrent phases, or only paused phases? We also have seen an increase in time taken to dedupe strings when snapshots are being taken. My guess is G1GC is taken by surprise because of the memory pressure the snapshot puts onto the heap

Any recommendations to reduce young gen frequency

More info available here

DavidTurner · May 16, 2020, 12:16pm

It's certainly possible: taking a snapshot does involve moving a load of data around and maybe we're creating more garbage than needed. There's been a lot of work on streamlining the snapshot process since 7.2. Off the top of my head I'm not sure if any of it would directly affect memory pressure but it might. Can you reproduce this in 7.7?

Faiz_Ahmed_Mushtak_H · May 16, 2020, 12:24pm

I'll need to check that @DavidTurner
We may plan for an in-place upgrade as elasticsearch has already reached 7.7

system · June 13, 2020, 12:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk[s][r]] Elasticsearch	14	7285	August 3, 2021
Circuit Breaker Tripping During Shard Relocation Elasticsearch	1	897	April 24, 2018
Does this feature really work as the description? https://github.com/elastic/elasticsearch/pull/59394 Elasticsearch	4	286	November 15, 2021
Elasticsearch after upgrade on 7.7.0 starts CircuitBreakingExceptions and nodes leaves/rejoin cluster Elasticsearch	9	719	September 20, 2021
Circuit_breaking_exception during reindex Elasticsearch	22	4748	November 20, 2019

Shards getting marked as stale frequently causing cluster to go yellow

Related topics