Slow Cluster in Elastic Cloud since updating to 7.12

Hi @mom-douellet, thanks for the information.

Indeed the problem looks very similar to yours (100% cpu always, data inter node about 10 TB! And we ingested like only 8GB in the interval). I'm not in my working computer right now but I will provide a view of what happened:

After nobody from elastic mod team could help here, I had to "pay to play". As I am the only one managing the cluster I upgraded my license to platinum and filled an urgent ticket. My cluster was green but useless in this meantime, and I couldn't wait three business days, the SLA for normal things.

The engineers were very kind, provided two fixes during one day, and sent me a bunch of commands to run in dev tools a few times (4 hours interval, run the command).
And then the elastic dev team solved the issue and fixed the cluster.

I think you should do a rollback if you can (I couldn't find this option in elastic cloud). Probably elastic will provide a wider fix in the next days/weeks for the public. Hope the best for your cluster.

Yeah, not sure I can rollback easily, but I'm in the fortunate position to be able to scrap the cluster and restart a new one.

Will search for the answer a couple of days. Maybe the dev team will push the fix publicly.

Take care

If you have created any Spaces you could be hitting this issue in Kibana; [Search Sessions] Kibana fails to update or delete sessions in non-default space · Issue #96124 · elastic/kibana · GitHub

You can check with a query like this;

GET .kibana_task_manager/_search
{ "size": 100 }

A fix is coming in 7.12.1. There are some work-around steps you can take described in that issue.


Thanks @LeeDr this was spot on.

The work around did resolve the CPU and network issue.