I have a hosted elastic cloud AWS deployment setup using the aws.data.highio.i3 tier with 2-zone fault tolerance in aws us east. My end point is https://xxxxxxxxxxx.us-east-1.aws.found.io:9243
Our web servers reverse proxy to the above end point. We've had this configuration in place for quite some time and have not had any issues.
Yesterday (Aug 11) between 17:05:00 edt and 18:37:00 edt we were unable to connect to the end point, we logged over 2,000 connection errors during the time frame.
I started to dig in to this today by looking at logs from our web servers, and elastic cloud logs + monitoring cluster. I have not been able to find anything to suggest what might have caused this problem, the cluster health appears to have been in good status during this time. I did not find any downtime notices posted on elastic status page nor for AWS
I'm really trying to understand what could have happened to cause the problem and how it corrected itself. My only guess is that it could have been a routing issue between our network and AWS, although our logs clearly show failures to multiple different AWS ips resolved from the above endpoint. We also did not note any problems with any of our other AWS related services during the time frame mentioned above.
Does anyone have any ideas as to what more I could check on Elastic Cloud hosted end that might give more clues as to what happened?