Originally I mentioned this on jessgreen thread which is quite similar.
We use Elasticsearch 6.2.4 as our document content index with Micro Focus Content Manager. I am working to implement Content Manager as replacement for Records Manager which is the previous incarnation of the application.
I have been indexing the content of the documents in the system and have come across what appears to be a problem document. It causes out of memory errors in the Elasticsearch heap space. The document is an Excel spreadsheet 63Mb containing around 800000 rows of data
I consider our requirements for Elasticsearch to be quite modest and so I have only set up a small cluster of 2 servers with 8Gb RAM, so 4Gb heap size. Even so, 4Gb is a lot larger than 63Mb, so being new to Elasticsearch I would think that the document in question shouldn't really tax the system in this way. My reading of the Content Manager re-index log suggests that this document causes transactions to be sent around 10Mb at a time. 70Mb is sent without issue and then another 70Mb is sent with no response from Elasticsearch, with timeouts and at some point the service fails.
NB: I am using the re-indexing feature in Content Manager but in fact I am building a brand new index.
Edited: 63Mb not 63Kb.