we are using a service on-prem to post data to our elastic cluster in the cloud...hosted by elastic. Not so long ago we had 1 instance running on AWS in frankfurt to which we pushed data. We never experienced timeouts or anything on the api calls. A couple of weeks ago we switched to a new Azure instance. Immediately after the switch we noticed quite a bad performance of the api. We see frequent response times of 10 to 20 seconds. At high time we push about 8k of small messages per minute. The health of the cluster all seems well...no out of memory, no weird cpu spikes but still i get these massive timeouts which cause hundreds of messages to fail. I'm using a datastream with a template to roll over from hot to warm but I'm still in my first index. Anyone might have an idea why the api calls are so slow sometimes?
Are you talking about our Elasticsearch Service?
the elastic cloud yes.
I did see error on instances like:
|Oct 31, 2020, 2:01:08 AM UTC|ERROR|i5@westeurope-3|[instance-0000000005] collector [job_stats] timed out when collecting data|
|Oct 31, 2020, 2:01:06 AM UTC|ERROR|i2@westeurope-2|[instance-0000000002] collector [node_stats] timed out when collecting data|
|Oct 31, 2020, 1:23:18 AM UTC|ERROR|i5@westeurope-3|[instance-0000000005] collector [cluster_stats] timed out when collecting data|
After i resized the warm part of the cluster to something larger....4 Gb to 15 Gb the response times dropped and returned to normal....and I didn't see the ERROR messages anymore. Could it be that with a fresh provisioning the erronous instances were discarded?
You may have hit resource limits on the original nodes, so upgrading gave you more to work with.
I browsed through all the monitoring stats but I couldn't see any of it which seemed concerning. Is there something of a matrix which shows available threads/iops or something per size of the node. Now I can see only the amount of memory in Gb which also translates to storage.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.