I'm still debugging the issue, but what I've done so far is to disable all our metricbeat and filebeat clients, as well as node logs and metrics. With all that disabled, the CPU utilization dropped to around 10% which is not 0%, but some utilization is expected from a running node. I was still getting around 100 index and search requests per 5 minutes and I have no idea where they are coming from (internal stuff?).
If I enabled just node metrics, CPU utilization jumped to around 20-25% (on cluster with just 1 node, 4 GB RAM and 2 GB RAM for Kibana) which seems very high for just collecting some metrics.
If I enabled just node logs, CPU utilization seemed unimpacted (stayed at around 10%).
So I decided to enable node logs and enable 4 of the 6 clients (the last 1 or 2 clients, that weren't enabled yet, produce the most logs, but the same amount of metrics as the others) and the CPU utilization seemed unimpacted (stayed at around 10%).
Then I decided to enable the last 2 clients, so I had back online all the clients but no node metrics (Stack Monitoring). CPU utilization at first (in the morning, when I enabled the last clients) seemed unimpacted, but at around 18:00 it gradually increased to around 20-25%. Search requests stayed at around 100 per 5 minutes, but index requests increased to around 400 per 5 minutes.
These are the current performance graphs (running all but the Stack Monitoring):
Before the upgrade to Elastic 7.10.0 we were paying $36/month (for the same 6 clients setup) for one of the cheapest deploys (2 GB RAM for Elastic and 1 GB for Kibana). That setup had a constant memory pressure at around 75% or even higher sometimes and seemed that it needs some more resources in order to ensure stable operation. That's why I've increased resources 2x on Elastic and Kibana side (4GB RAM for Elastic, 2 GB for Kibana) which now costs us 3x as much at $110/month which is a significant cost increase for such a small deploy.
The deploy was running fine after that (on 7.6.2) but I needed to enable stack monitoring alerts and upgrade to 7.10 seemed like a good solution. At first after the upgrade all was fine, but after a few days or even weeks (don't really remember exactly) I started receiving high CPU utilization alerts without any change on the Elastic/Kibana or our clients side. And after contacting Elastic support, they didn't know how to solve our problem except for allocating even more resources to our cluster which could easily increase our cost by another 2x to about $220/month just to "fix" the problem?
The next step now is to enable additional logging on our clients which will put more pressure on the node/cluster and I need to see how CPU Usage and Memory Pressure behave after that, before I try to (re)enable Stack Monitoring to see what happens to it this time.
I'm also tweaking Index Templates and ILMs in the mean time to see if that has any impact on the node performance.
Oh, and regarding having only 3 days of metrics in my previous post. It turns out that something was keeping only the last 3 metrics' daily indices, but I have no idea what was deleting the older ones, since these indices weren't linked to any ILMs.