Network connections not being accepted in all cases

Hello Folks,

I have an elasticsearch cluster (v1.7.5) on Ubuntu (4 data, 3 masters).

Normal operation looks fine - nice response time.

This is the case when used behind a load balancer, or when I stipulate all node IPs in my cluster and the client (my application) connects directly to each node.

However, intermittently, elasticsearch does not accept new connections. By this I mean, the TCP 3 way handshake is not completed; the initial SYN frame arrives at the host, on the correct port, but no SYN, ACK frame is sent back.

I have marvel, and looked at the request queues (LISTENER THREAD POOL REJECTED), and they seem fine (0 rejections).

I cannot see anything in the logs that ever said anything to do with network issues, and I was wondering if
a) Anyone has similar issues
b) Anyone can point me in a direction for debugging further

Thanks for your assistance in advance!

Kind Regards,
Matt

It sounds like a LB issue, is there any sort of connection reset/timeout values that you can change?

Hi Mark,

I've ruled out the load balancer itself, by configuring the NEST client with the data node IPs directly.

In this configuration, the issue still occurs.

Supporting services were able to trace the un-answered SYN packet up until it gets swalled by dev/null or similar.

Has no-one had this type of issue before?

Perhaps someone could tell me if elasticsearch - under extreme loads - ever ignores a connection attempt?

And by connection attempt, I mean the Tcp three way handshake - first leg (SYN).

Or does it answer with Service Unavailable? (I have seen it do this before)

Im sorry to be so vague, but I dont know if this is some kind of failure condition, or a network issue with my host (Microsoft Azure).

Any ideas would be greatly appreciated!

Regards,
Matt