Kibana Fleet "Another broken 7.11.1 upgrade"

I'll be honest this is painful so far every upgrade from kibana 7.1 to 7.11.1 has resulting in something going wrong 1 prod cluster and 2 dev clusters. Seems I just have terrible luck. All index's have been default for months now due to not having time to correct changes between versions.

Test node for this case is a stand alone with all roles including transform.

Some people have luck getting Kibana 7.11.1 to even start. Most people failure's have been Fleet which is not the only case for me.

Upgraded 7.10.2 to 7.11.0 -- Failed had to roll back due to fleet bug. All was fine
Waited a few days. Upgraded from 7.10.2. to 7.11.1 -- Total failure leaving me with dozens of issues.

First one:
Upgrade to 7.11.1 has resulted in Kibana not doing anything at all. The service starts but generates 0 log's and it's configured for verbose logging and never presents a web page. It's a dead services. Even journctl show 0 events just services started. I let it sit for 9+ hours in this state and it never changed. Did the basics restarted the server checked for none existing logs. Troubleshooting a failed start at this point is not really possible. Only option downgrade.

Downgrade results to 7.11.0 "repo version" is a total bust for Fleet but at least it starts now.

And then...

[illegal_argument_exception] updating component template [logs-endpoint.events.file-mappings] results in invalid composable template [logs-endpoint.events.file] after templates are merged response from /_component_template/logs-endpoint.events.file-mappings: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"updating component template [logs-endpoint.events.file-mappings] results in invalid composable template [logs-endpoint.events.file] after templates are merged"}],"type":"illegal_argument_exception","reason":"updating component template [logs-endpoint.events.file-mappings] results in invalid composable template [logs-endpoint.events.file] after templates are merged","caused_by":{"type":"illegal_argument_exception","reason":"template [logs-endpoint.events.file] has alias and data stream definitions"}},"status":400}

Refreshing the page will give me 1 of 6 different errors so it's hard to pin point what it is. Any tips on how to fix this and not have it happen again.

The update to 7.11.1 contains the fix for the error regarding composable / component templates. For that error, a workaround exists and is described here, in case you want / need to stay on 7.11.0:


The issue that 7.11.1 does not start at all is something else, let me investigate if that's a known issue.

It is possible you've run into 7.11.0 saved object migrations take very long to complete due to huge number of fleet-agent-events · Issue #91869 · elastic/kibana · GitHub here. The workaround is to delete all fleet-agents-events saved objects with

POST .kibana/_delete_by_query
{
  "query": {
    "bool": {
      "must": 
        {
          "term": {
            "type": "fleet-agent-events"
          }
        }
    }
  }
}

before upgrading Kibana.

That worked thank you much.

The node that was being tested only had 1 agent on it and only had about 2 hours of log's on an nvme drive. Not looking forward to the production cluster update which has hundred's of devices talking to it.

Update:
SIEM rules for Endpoint are now broken. logs-endpoint.alerts-* can not be found. Yet it's still present.

@PublicName could you please provide us with a screenshot of the error you see in the Rule Details page as well as a screenshot of the Rule Details -> Failure History tab?

In addition could you please verify that the logs-endpoint.alerts- * index exists by running this command in Kibana -> Dev Tools:

GET logs-endpoint.alerts-*/_mapping

Mappings are now empty for log-endpoint.alerts-* and for endgame-*. Seems they were wiped out from the upgrade. I confirmed this on a separate node that was also updated from 7.10.1 to 7.11.1 both have the same failure's with the rules and both indices are empty.

The two patterns that are directly related to the security part for the endpoint alerts at least for the current default rules that are released "2/19/2012"

Adversary Behavior - Detected - Endpoint Security = ["endgame-*"]

Endpoint Security = ["logs-endpoint.alerts-*"]

@PublicName Did you have data in the logs-endpoint.alerts-* or endpoint-* indices prior to upgrade?

If not, this behavior is expected. In 7.11.1, we are more proactive in showing users that their rules may not be generating any results when their configured index patterns don't exist. For example, prior to 7.11, if a user had the Elastic Endpoint rule enabled but didn't have Elastic Endpoint configured and sending data, the rule would run successfully even though the logs-endpoint.alerts-* index didn't exist. In 7.11 and 7.11.1, we let users know their rules may not be generating alerts when the configured index patterns are non-existent by displaying a failure status. We understand this behavior of displaying failure statuses might be a bit jarring, so in 7.11.2 we will make this a warning status instead of an error status.

Yes I had data prior to the upgrade. The cluster that was upgraded was a dev one with several sandbox machines pointing to it so I had several event's triggered. Which is why this is concerning. Both indexes no longer have mappings which is a good reason to fail.

Would you happen to know on GitHub where the default mappings are so I can update the index and have them triggering correctly again.

Jarring is an understatement.