Node Restart Due to Running out of Memory


(banderon1) #1

We have been receiving these notifications more and more lately (each time for a separate instance), and would like further insight into why they are happening. After reviewing the logs around this time period, it looks like there was some garbage collection that occurred at 17:22:09, which was followed by a few hundred “search.action” warnings with the following message:

Failed to send release search context org.elasticsearch.transport.SendRequestTransportException

We haven’t noticed any unusual spikes in traffic, so we don’t know why this may be happening. We are considering upgrading the memory if that is the solution, but would probably need to decrease our fault tolerance to save on costs. Do you have any data to suggest how much downtime has been saved by having our cluster spread across 3 data centers vs if we only had it at 2?

I also noticed that the "Default number of shards" is set to 1. I thought we originally had that set to 10, and I'm not sure when it changed. Can you provide suggestions about what that should be?


(system) #2