Does IndexWriter memory stats account for all memory?

Coming from https://github.com/elastic/elasticsearch/issues/41337, I've learnt that IndexWriter holds some memory when scroll contexts are open.
The theory is that I have JVM OOMs due to long running scrolls and IndexWriters eating up all of the heap, but monitoring indices/segments/index_writer_memory_in_bytes in _nodes/stats I can't see any abnormal pattern there.
Could you please help to understand why is this? Shouldn't those stats contain all IndexWriter memory?

Hi @Attila_Nagy, the memory we're looking at here isn't really being held by the IndexWriter, it's held by the readers attached to each scroll context. The index_writer_memory_in_bytes statistic tracks only the RAM used by the indexing buffer, which influences when this buffer is flushed.

Whether it should be accounted for under some other stats is another question, of course. It looks a little tricky to do this, because the bit sets are copy-on-write and may be shared between multiple scrolls if it is unchanged. Better than tracking it would be to apply backpressure if it got too high and I think the real memory circuit breaker introduced in 7.0.0 would do that.

1 Like

Hi,

I'm totally lost then.
You wrote in the issue:

the bulk of the 100s of MBs taken up by those org.apache.lucene.index.IndexWriter s looks to be tracking the live docs in each segment, which is only needed for segments containing deletes

and according to the heap dumps, the largest heap consumers were org.apache.lucene.index.IndexWriter indeed.
Could you please clear this mess in my head? :slight_smile:

BTW, backpressure here will mean it will reject nearly all queries (because there won't be enough memory for them)?
But if it knows what is too high then it's measured somewhere, right?
I monitor the circuit breaker related stuff (because of an older issue: https://github.com/elastic/elasticsearch/issues/27525), with upgrading to 7.0.0 will those contain this kind of memory as well?

Thanks!

Sorry, it's kinda complicated and I'm perhaps not quite using the right terminology :slight_smile:

The IndexWriter looks after all the readers open on an index too, which is why the heap dump shows an IndexWriter retaining 354MB of heap, but if you drill into it one level you see that it's all in its readerPool field: it's really the readers that need to keep track of the old versions of the live docs set that is taking all your memory.

The real-memory circuit breaker looks at the heap usage according to the JVM; the JVM tracks everything but doesn't give you a detailed breakdown by the types of usage. I think this corresponds to this statistic, even in 6.6:

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_in_bytes

Thanks, that makes sense.
Now I just really would like to see how this changes over time.
Sadly, monitoring the JVM heap won't show anything due to GCs and all stuff happening there, it's just a wildly changing graph.

It's very bad that I don't know how much memory is used by keeping those scrolls and in either versions (6.x where somebody will get an OOM and in 7.x where queries will be just rejected, if I understand it right) there is no way to measure this.
I think even an approximation would be good here for a start, if the CoW semantics make this hard to do exact measures.
May we have such a stat?
For other stuff elastic gives a pretty fine way to figure out what happens to help debugging problems, but this seems to me a black box (no way to see detailed info about open scrolls, no way to see how much memory they hold etc).

It seems like a reasonable feature request. However, could you clarify how this will help you in production? If this statistic gets too high do you expect to be able to prevent your system from opening any further scrolls, or will you still be relying on pushback from Elasticsearch? As you rightly note, the real-memory circuit breaker will push back on all traffic, not just scrolls.

Yes, I could limit opening new scrolls.
But for diagnosis, this would help with https://github.com/elastic/elasticsearch/issues/41376 together, so I could see what scroll contexts are alive and by closing them, how much memory could I free up.
Surely, the best would be if the _cat endpoint could show us the scrolls's memory needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.