I have a multi datacenter ES cluster. In our network idle WAN tcp sessions are getting closed after some timeout (it's 60 minutes now).
Strange thing but it seems that ES isn't tolerant for that. I can see that every 60 minutes nodes report to the log "master left (reason = transport disconnected)". Several seconds later nodes successfully detect master and join the cluster back.
I thought ES sends keepalive packets to keep session alive and tries to re-establish the connection to the master in case of if it breaks before leave the cluster. But seems that it isn't true.
if you were putting data at the moment of cluster nodes disconnect you'll lose some (about 3 minutes) of data (despite acknowledgement that all documents were accepted). All documents submitted one by one, no bulk api.
Could you help me to find the clue for the problem?