Lost all Fleet agent policies and Security Rules after upgrade to 8.2

I have a cluster running in a lab that I use to test out things before we do them in production. This cluster started out on version 7.14.x roughly. Fleet and agents have been installed and upgraded several times, through 7.16, 7.17, 8.0, 8.1 and recently to 8.2. I believe the cluster was on 8.1.2 and instead of going to 8.1.3, I just decided to upgrade to 8.2.

Upgrading the Elasticsearch nodes seemed to go just fine, but kibana would not start after the upgrade. All the Elastic Security Rules and all the Agent Policies were missing. I didn't have /var/log/messages logging at that time, so I think in journctl I found 2 errors that I corrected and Kibana was able to start after that. However the rules and policies were/are missing.

I can't find the errors now, but one of them was fixed with this command. I believe I had to use journalctl to view them.
curl -k -u -XPUT 'https://localhost:9200/_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}, "persistent": {"cluster.routing.allocation.enable": null}}'

Can anyone provide some context to what might have happened and if the missing data is still available to be retrieved or recovered?

What I have done so far is; created new policies for fleet server and default 1, and windows server 1 policies. And I have redeployed the fleet server integration and 2 agents. As there was no way to upgrade or unenroll the agents I found that they could be forcibly removed by using this api

curl -k -u elastic:homelab --request POST --url https://localhost:5601/api/fleet/agents/<agent_id>/unenroll --header 'content-type: application/json' --header 'kbn-xsrf: xx' --data-raw '{"force":true,"revoke":true}' --compressed

So I could continue to remove the agents and recreate the policies. However, it might be good to try to find out what happened and why. If anyone has any suggestions as things to check or information to retrieve to answer that question, I'd be interested to know.

Edit: I added the screenshot. The 3 policies shown are ones I recreated. I reinstalled their respective agents. The rest of the list showing the agents without a policies are the old agents. Also I notice now that the 2 test spaces I had created months ago are also missing.

Thanks!
Robert

Hi @robhep,

I'm not sure about fleet + elastic agent but you mentioned your security solution rules are all missing after your upgrade to 8.2?

Can you run the below query in dev-tools (as a superuser role) and let us know if you see anything? I can't think of anything that would erase all of the rules after an upgrade so I'm curious if the data is still there and something in the middle is preventing them from showing up. If the data is not there, I'm wondering if something with the routing went wrong? You mentioned having to update the cluster routing allocations. There are some docs here but my understanding is that by setting

{"cluster.routing.allocation.enable": null}, "persistent": {"cluster.routing.allocation.enable": null}}

you essentially are asking Elasticsearch to not allocate any shards for any indices, which could explain why there is missing data. Did you have other data stored in this cluster outside of security solution rules and fleet policies?

GET .kibana*/_search
{
  "query": {
    "terms": {
    "alert.alertTypeId": [
        "siem.queryRule",
        "siem.eqlRule",
        "siem.indicatorRule",
        "siem.mlRule",
        "siem.savedQueryRule",
        "siem.thresholdRule"
      ]
    }
  }
}

Thanks Devin. Sorry for not including this in the first post, but when I noticed the rules were missing and the button to Enable Rules was available, I clicked it. So the rules have been "re"loaded back in already. However, I ran the above and this is the first portion of the output.

#! this request accesses system indices: [.kibana_7.15.0_001, .kibana_7.15.1_001, .kibana_7.15.2_001, .kibana_7.16.0_001, .kibana_7.16.1_001, .kibana_7.16.2_001, .kibana_7.16.3_001, .kibana_7.17.0_001, .kibana_8.0.0_001, .kibana_8.0.1_001, .kibana_8.1.0_001, .kibana_8.1.2_001, .kibana_8.2.0_001, .kibana_security_session_1, .kibana_task_manager_7.15.0_001, .kibana_task_manager_7.15.1_001, .kibana_task_manager_7.15.2_001, .kibana_task_manager_7.16.0_001, .kibana_task_manager_7.16.1_001, .kibana_task_manager_7.16.2_001, .kibana_task_manager_7.16.3_001, .kibana_task_manager_7.17.0_001, .kibana_task_manager_8.0.0_001, .kibana_task_manager_8.0.1_001, .kibana_task_manager_8.1.0_001, .kibana_task_manager_8.1.2_001, .kibana_task_manager_8.2.0_001], but in a future major version, direct access to system indices will be prevented by default
{
  "took" : 259,
  "timed_out" : false,
  "_shards" : {
    "total" : 71,
    "successful" : 71,
    "skipped" : 57,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3234,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".kibana_8.0.0_001",
        "_id" : "alert:9e394619-59e1-11ec-abd2-13d660df7903",
        "_score" : 1.0,
        "_source" : {
          "alert" : {
            "name" : "Connection to Internal Network via Telnet",
            "tags" : [
              "Elastic",
              "Host",
              "Linux",
              "Threat Detection",
              "Lateral Movement",
              "__internal_rule_id:1b21abcc-4d9f-4b08-a7f5-316f5f94b973",
              "__internal_immutable:true"
            ],
            "alertTypeId" : "siem.eqlRule",
            "consumer" : "siem",
            "params" : {
              "author" : [
                "Elastic"
              ],
              "description" : "Telnet provides a command line interface for communication with a remote device or server. This rule identifies Telnet network connections to non-publicly routable IP addresses.",
              "ruleId" : "1b21abcc-4d9f-4b08-a7f5-316f5f94b973",
              "falsePositives" : [
                "Telnet can be used for both benign or malicious purposes. Telnet is included by default in some Linux distributions, so its presence is not inherently suspicious. The use of Telnet to manage devices remotely has declined in recent years in favor of more secure protocols such as SSH. Telnet usage by non-automated tools or frameworks may be suspicious."
              ],
              "from" : "now-9m",
              "immutable" : true,
              "license" : "Elastic License v2",
              "outputIndex" : "",
              "maxSignals" : 100,
              "riskScore" : 47,
              "riskScoreMapping" : [ ],
              "severity" : "medium",
              "severityMapping" : [ ], or paste code here