I have an ES cluster with 3 nodes with 8GB RAM and 100GB HDD each and I've set their heap spaces to 4GB of RAM. They contain 8 indices with 70GB data combined and the complete data is updated daily at midnight. The problem that I'm facing is that while indexing the data, ES keeps going out of memory, throws an EngineClosedException
and also the indexing script gives a Timeout
error.
What I want to know is if the size of my machine is less for the given amount of data and indexing. If yes, how large a machine should I use in this situation?
EDIT
I'm sending the data to ES in batches of 5000 per request, I'm running a minimum of 20 threads at a time which I plan to increase to 40 by running two indexing scripts concurrently. I'm dumping the output of the indexing script in a file which gives me the information about timeout and when ES crashes, the log gives information about EngineClosedException
and OutOfMemoryException
.
Each batch of 5000 entries stands somewhere between 15-20 MB.