i have a recurring problem where my Kibana security alerts stop being fired.
I have noticed this problem in two clusters, one running Kibana version 7.14.1 and the other running 7.17.2, the queries successfully run across configured rules yet alerts don't fire, i restart Kibana on both clusters, and the alerts return to normal before failing a few minutes after.
The kibana 7.17.2 node is running on 32GB of ram with two other data node with the same ram.
the kibana 7.14.1 is on 16GB with two data nodes with 32Gb each as well.
I can't find anything wrong with my logs and can't really know what's causing this issue, except the following error I saw in the 7.14.1 Kibana cluster,
The image you pasted is of a "circuit breaking" exception. Your cluster is under stress. That message is logged when Elasticsearch is handling A LOT of data. See the following for more info: Circuit breaker errors | Elasticsearch Guide [8.11] | Elastic
In this case, whatever logged this did not CAUSE the problem, it was just AFFECTED by it. The problem is somewhere else.
Since it seems your cluster is under stress, I'm guessing that why things are going bad in Kibana.
The kibana 7.17.2 node is running on 32GB of ram with two other data node with the same ram.
the kibana 7.14.1 is on 16GB with two data nodes with 32Gb each as well.
Just checking, each of these Kibana nodes is using it's own elasticsearch set of nodes as well, right? You can't really run two different Kibana versions using the same elasticsearch cluster - that would certainly cause lots of problems.
You also don't need that much RAM for Kibana. Rarely would you ever need more than 4GB RAM.
7.14.x is pretty old; upgrading it the latest 7.17.x may resolve your issues as well. And we have a lot of new features in 8.x you're missing out on!
First of all, thank you so much for your revert, yes indeed I have two clusters, and each one has its own kibana instance, each kibana is with a dedicated elasticsearch instance as well.
Thanks for the info, the upgrade is planned however we have a few things that we need to check policy-wise before proceeding with it, therefore, it's not a solution I can follow right now, how my i know the root cause of this stress to fix it if possible.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.