Connection issues to ES cluster from .NET (System.Net.Sockets.SocketException)

We are using 3 data nodes, 3 master nodes, 2 client nodes as elastic cluster in Azure. We have an NGINX sitting in front of client nodes to proxy the data.

ES version: 2.2
NEST: 2.1.0

I am trying to connect from .NET application using NEST library and run simple GET query to retrieve a document using Id. Every now and then, the ES queries fails with the System.Net.Sockets.SocketException with error message: "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond XX.XX.XX.XX:9200.".

I tried with following scenario as well.

  • Remove NGINX from the path and directly connect to Client nodes. Only Azure Load Balancer in front of client nodes. - Same problem.
  • Use native HttpClient instead of using NEST library - same problem.
  • Tweak connection settings on Elastic Client - same problem.

Currently, I am using Static Connection pool following connection settings.

            var staticPool = new StaticConnectionPool(new[] {new Uri("http://es-url:9200")});
            var connection = new HttpConnection();
            var settings = new ConnectionSettings(staticPool, connection);
            settings = settings.EnableTcpKeepAlive(TimeSpan.FromMilliseconds(2000)
                               .TimeSpan.FromMilliseconds(2000))
                               .RequestTimeout(TimeSpan.FromMinutes(10))
                               .MaximumRetries(10)
                               .DisableAutomaticProxyDetection()
                               .DisableDirectStreaming()
                               .MaxRetryTimeout(TimeSpan.FromMinutes(10))
                               .EnableHttpCompression()
                               .DisablePing();

Any suggestions? This is a blocking issue on our side as connection attempt to ES fails with above mentioned exception at constant rate and we would like to solve this problem as soon as possible.

Tejas

Based on what you have tried (using NEST and HttpClient), this sounds like an issue with the network.

A couple of things to try:

  1. What happens if you use SingleNodeConnectionPool instead of StaticConnectionPool? Since you're only configuring one Uri, you don't need the reseeding ability of StaticConnectionPool.

  2. Can you monitor TCP connections on the client box with TCPView and also look at enabling Network tracing on your .NET application to capture more information about what is happening at the network level; NEST and Elasticsearch.Net use HttpWebRequest under the covers so configuring tracing for System.Net should provide more details.