Does IndexWriter memory stats account for all memory?

Attila_Nagy · May 2, 2019, 8:30am

Coming from https://github.com/elastic/elasticsearch/issues/41337, I've learnt that IndexWriter holds some memory when scroll contexts are open.
The theory is that I have JVM OOMs due to long running scrolls and IndexWriters eating up all of the heap, but monitoring indices/segments/index_writer_memory_in_bytes in _nodes/stats I can't see any abnormal pattern there.
Could you please help to understand why is this? Shouldn't those stats contain all IndexWriter memory?

DavidTurner · May 2, 2019, 8:49am

Hi @Attila_Nagy, the memory we're looking at here isn't really being held by the IndexWriter, it's held by the readers attached to each scroll context. The index_writer_memory_in_bytes statistic tracks only the RAM used by the indexing buffer, which influences when this buffer is flushed.

Whether it should be accounted for under some other stats is another question, of course. It looks a little tricky to do this, because the bit sets are copy-on-write and may be shared between multiple scrolls if it is unchanged. Better than tracking it would be to apply backpressure if it got too high and I think the real memory circuit breaker introduced in 7.0.0 would do that.

Attila_Nagy · May 2, 2019, 8:59am

Hi,

I'm totally lost then.
You wrote in the issue:

the bulk of the 100s of MBs taken up by those org.apache.lucene.index.IndexWriter s looks to be tracking the live docs in each segment, which is only needed for segments containing deletes

and according to the heap dumps, the largest heap consumers were org.apache.lucene.index.IndexWriter indeed.
Could you please clear this mess in my head?

BTW, backpressure here will mean it will reject nearly all queries (because there won't be enough memory for them)?
But if it knows what is too high then it's measured somewhere, right?
I monitor the circuit breaker related stuff (because of an older issue: Circuit breaker grows indefinitely when >2GiB of mget is issued (and possibly at other places as well) · Issue #27525 · elastic/elasticsearch · GitHub), with upgrading to 7.0.0 will those contain this kind of memory as well?

Thanks!

DavidTurner · May 2, 2019, 9:24am

Sorry, it's kinda complicated and I'm perhaps not quite using the right terminology

The IndexWriter looks after all the readers open on an index too, which is why the heap dump shows an IndexWriter retaining 354MB of heap, but if you drill into it one level you see that it's all in its readerPool field: it's really the readers that need to keep track of the old versions of the live docs set that is taking all your memory.

The real-memory circuit breaker looks at the heap usage according to the JVM; the JVM tracks everything but doesn't give you a detailed breakdown by the types of usage. I think this corresponds to this statistic, even in 6.6:

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_in_bytes

Attila_Nagy · May 2, 2019, 9:37am

Thanks, that makes sense.
Now I just really would like to see how this changes over time.
Sadly, monitoring the JVM heap won't show anything due to GCs and all stuff happening there, it's just a wildly changing graph.

It's very bad that I don't know how much memory is used by keeping those scrolls and in either versions (6.x where somebody will get an OOM and in 7.x where queries will be just rejected, if I understand it right) there is no way to measure this.
I think even an approximation would be good here for a start, if the CoW semantics make this hard to do exact measures.
May we have such a stat?
For other stuff elastic gives a pretty fine way to figure out what happens to help debugging problems, but this seems to me a black box (no way to see detailed info about open scrolls, no way to see how much memory they hold etc).

DavidTurner · May 2, 2019, 9:51am

It seems like a reasonable feature request. However, could you clarify how this will help you in production? If this statistic gets too high do you expect to be able to prevent your system from opening any further scrolls, or will you still be relying on pushback from Elasticsearch? As you rightly note, the real-memory circuit breaker will push back on all traffic, not just scrolls.

Attila_Nagy · May 2, 2019, 10:35am

Yes, I could limit opening new scrolls.
But for diagnosis, this would help with https://github.com/elastic/elasticsearch/issues/41376 together, so I could see what scroll contexts are alive and by closing them, how much memory could I free up.
Surely, the best would be if the _cat endpoint could show us the scrolls's memory needs.

system · May 30, 2019, 10:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES6.1 Heap memory used by the index writer Elasticsearch	9	2567	March 5, 2018
Index writer memory Continue to rise Elasticsearch	22	2565	February 12, 2020
Abnormal index_writer_memory,its so high! Elasticsearch	3	362	April 7, 2020
Segments memory_in_bytes excessively large with allot of open indices Elasticsearch	4	3708	July 5, 2017
Out of Memory on 2.3.1 (cluster id 14de95) Elasticsearch	13	2156	July 5, 2017

Does IndexWriter memory stats account for all memory?

Related topics