We're currently using Elastic Cloud, and we've noticed a pattern that occurs with our production cluster that involves us having to reboot about once a week.
We store contact information in our cluster, so we're constantly reindexing and performing queries on the cluster. I'd say it's pretty balanced between searches/indexing.
About once a week, we notice that one of our two nodes starts to pretty significantly underperform in terms of latency. We've got a circuit breaker pattern in place with an SQL search fallback to mitigate these scenarios in an automated fashion.
We've found that if we reboot the cluster, we're good for another 5-7 days, at which point, latency increases again, and another reboot is needed.
We're struggling to identify what the root cause of this particular issue is and to see if anyone else experiences something similar.