Kibana 7.12 Login failure "Fleet"

Upgrading to 7.12 has been one nightmare after another. Been using fleet for "Endpoint" and so far it's been less then pleasant. Got to love Beta.

Issue:
Unable to login into Kibana with elastic user so you know this is an annoying problem. Works just fine with Curl so it's not fat fingering every password.

A couple errors that repeat hundreds of times:

["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":1316,"message":"Unable to bulk upload the stats payload to the local cluster"}

["warning","plugins","monitoring","monitoring","kibana-monitoring"],"pid":1316,"message":"Error: [export_exception] failed to flush export bulks\n at respond (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)

"message":"Error: [export_exception] failed to flush export bulks\n at respond"

"message":"Unable to bulk upload the stats payload to the local cluster"

At first I took a look here Task Manager troubleshooting | Kibana Guide [master] | Elastic. This really told me already what I expected. The cluster was running fine with 7.11.2 with all the same agent's settings and SIEM rules enabled. In fact I had almost double the rules running on 7.11.2 and it didn't blink an eye. 7.12 has been unstable with extreme CPU using caused by Java on all nodes. I ended up adding 2 more CPU cores per VM and it just ate them as well. Nodes are sitting at 90 to 100% all day which previously was 5% to with 15% when reports where running.

EDIT: To anyone that run's into this issues with 7.12. Check to make sure your Index Lifecycle policy ran! Mine was stopped after the update forcing me to use all disk space which prevents logging in. Fleet log lifecycle the reset where fine.

The lifecycle is set back to default after the update which is longer then what is needed in my case. You'll also need to watch as it's not honoring settings for size and will go the entire day before a roll over. I have it set at 50Gb and several of them where 300+Gb before a roll over.

It's unclear to me if you're still running into issues or if you were able to resolve it. Do you mind clarifying? Thanks!

Resolved the issue. It was related to the elastic agent log's and metrics issues in 7.12.1. Ended up sucking all drive space almost instantly. Forced into deleting several days worth just to be able to get in to disable the log's and metrics for the elastic agents in fleet. After that it was able to come back online.

IMHO
That is a really bad design on Kibana's part to be that tightly integrated on startup. It would have been 100x easier to be able to get back into Kibana and quickly see what the cause was. Having to drop to curl and digging around to find out Elastic had hit the hard limits at 80% drive space was hours of frustration. When your drives show 1Tb free and you can't recall what the limit's were as they were set a year ago... Honestly decoupling and allowing Kibana to not be reliant on Fleet objects or Elastic would make our lives a lot easier.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.