Intermitten Connection Failures when Querying Elasticsearch

We have a Rails API that queries Elasticsearch via the elasticsearch-rails gem. At around 10:30 this morning, we started getting a bunch of Faraday::ConnectionFailed errors. They don't happen every time, but they've been happening a bunch. Looking at our Elasticsearch console, I can see that our CPU, memory, and disk usage are all within healthy limits (no more than 50% capacity). We haven't made any code/infrastructure changes in a week, so I'm a little lost on how to debug this properly.

We've tried replacing the elasticsearch-rails gem with elasticsearch and are still getting ConnectionFailed errors, though these ones are slightly different/more verbose:

[Faraday::ConnectionFailed] Failed to open TCP connection to [REDACTED] (execution expired) 
{:scheme=>"https", :user=>"elastic", :password=><REDACTED>, :host=>"[REDACTED]", :port=>443, :protocol=>"https"}

[Faraday::ConnectionFailed] Attempt 1 connecting to 
{:scheme=>"https", :user=>"elastic", :password=><REDACTED>, :host=>"[REDACTED]", :port=>443, :protocol=>"https"}

The difficult/frustrating part about this bug is that it only happens intermittently, so it's hard to pin down the exact cause. Because it's happening in all of our environments (production, staging, qa), I'm led to believe it's a code issue and not an infrastructure one. Especially because all of our infrastructure is reporting itself to be healthy. But as I said earlier, we haven't made a code change in about a week, so I'm not sure why it suddenly started failing to connect so often.

Any help or direction would be greatly appreciated, as our platform is at its core a search platform that relies on Elasticsearch. Thanks.

More information: when connecting to our ES cluster from a local dev machine, things work as expected. When trying to hit the same ES cluster from a GCP cloud container, we see the errors.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.