Cluster runs out of memory and service stops during index of single document

Originally I mentioned this on jessgreen thread which is quite similar.

We use Elasticsearch 6.2.4 as our document content index with Micro Focus Content Manager. I am working to implement Content Manager as replacement for Records Manager which is the previous incarnation of the application.

I have been indexing the content of the documents in the system and have come across what appears to be a problem document. It causes out of memory errors in the Elasticsearch heap space. The document is an Excel spreadsheet 63Mb containing around 800000 rows of data

I consider our requirements for Elasticsearch to be quite modest and so I have only set up a small cluster of 2 servers with 8Gb RAM, so 4Gb heap size. Even so, 4Gb is a lot larger than 63Mb, so being new to Elasticsearch I would think that the document in question shouldn't really tax the system in this way. My reading of the Content Manager re-index log suggests that this document causes transactions to be sent around 10Mb at a time. 70Mb is sent without issue and then another 70Mb is sent with no response from Elasticsearch, with timeouts and at some point the service fails.

NB: I am using the re-indexing feature in Content Manager but in fact I am building a brand new index.

Edited: 63Mb not 63Kb. :wink:

Well, the obvious simple answer of increasing the RAM on the servers and reconfiguring has done the trick.
Still not a clue how I'm supposed to make an intelligent decision about sizing. Both Content Manager and Elasticsearch are black boxes to me.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.