Fail to upgrade Kibana from 7.11 to 7.12

Kibana times out waiting for index migration.

{"type":"log","@timestamp":"2021-04-06T13:09:46-05:00","tags":["info","savedobjects-service"],"pid":9608,"message":"[.kibana] UPDATE_TARGET_MAPPINGS -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK"}

Kibana checks ten times for migration completion. When it fails it restarts Kibana which kills the old task and start a new one. The timeout appears to be too short and causes the process to keep running/restarting and prevents Kibana from starting again.

Hi @paulbrown4

Can you provide full log file, are there any other errors you see at the beginning of the log?

Thanks,
Liza

This is an excerpt from '/var/log/kibana/kibana.log' It shows the migration attempts and eventual failure. Watching the indices in Elasticsearch I can see the '.kibana_7.12.0_001' index continuously grow and shrink as the task start, get canceled, and then start again (under a new task id).

    [TimeoutError]: Request timed out"}
    {"type":"log","@timestamp":"2021-04-06T13:05:52-05:00","tags":["debug","savedobjects- 
    service"],"pid":8167,"_tag":"Left","left":{"type":"retryable_es_client_error","message":"Request 
    timed out","error":{"name":"TimeoutError","meta": 
    {"body":null,"statusCode":null,"headers":null,"meta":{"context":null,"request":{"params":{"method":"GET","path":"/_tasks/LOFj6WLeTryp7gxeNP_ziA%3A64539749","body":null,"querystring":"wait_for_completion=true&timeout=60s","headers":{"user-agent":"elasticsearch-js/7.12.0-canary.1 (linux 3.10.0-1160.21.1.el7.x86_64-x64; Node.js v14.16.0)","x-elastic-product-origin":"kibana","x-elastic-client-meta":"es=7.12.0-canary.1,js=14.16.0,t=7.12.0-canary.1,hc=14.16.0"},"timeout":30000},"options":{},"id":585},"name":"elasticsearch-js","connection":{"url":"https://172.40.0.95:9200/","id":"https://172.40.0.95:9200/","headers":{},"deadCount":2,"resurrectTimeout":1617732472373,"_openRequests":0,"status":"dead","roles":{"master":true,"data":true,"ingest":true,"ml":false}},"attempts":3,"aborted":false}}}},"message":"[.kibana] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK RESPONSE"}
    {"type":"log","@timestamp":"2021-04-06T13:05:52-05:00","tags":["error","savedobjects-service"],"pid":8167,"message":"[.kibana] Action failed with 'Request timed out'. Retrying attempt 10 out of 10 in 64 seconds."}

    {"type":"log","@timestamp":"2021-04-06T13:08:56-05:00","tags":["debug","savedobjects-service"],"pid":8167,"_tag":"Left","left":{"type":"retryable_es_client_error","message":"Request timed out","error":{"name":"TimeoutError","meta":{"body":null,"statusCode":null,"headers":null,"meta":{"context":null,"request":{"params":{"method":"GET","path":"/_tasks/LOFj6WLeTryp7gxeNP_ziA%3A64539749","body":null,"querystring":"wait_for_completion=true&timeout=60s","headers":{"user-agent":"elasticsearch-js/7.12.0-canary.1 (linux 3.10.0-1160.21.1.el7.x86_64-x64; Node.js v14.16.0)","x-elastic-product-origin":"kibana","x-elastic-client-meta":"es=7.12.0-canary.1,js=14.16.0,t=7.12.0-canary.1,hc=14.16.0"},"timeout":30000},"options":{},"id":660},"name":"elasticsearch-js","connection":{"url":"https://172.118.0.103:9200/","id":"https://172.118.0.103:9200/","headers":{},"deadCount":1,"resurrectTimeout":1617732596495,"_openRequests":1,"status":"dead","roles":{"master":true,"data":true,"ingest":true,"ml":false}},"attempts":3,"aborted":false}}}},"message":"[.kibana] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK RESPONSE"}
    {"type":"log","@timestamp":"2021-04-06T13:08:56-05:00","tags":["info","savedobjects-service"],"pid":8167,"message":"[.kibana] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> FATAL"}
    {"type":"log","@timestamp":"2021-04-06T13:08:56-05:00","tags":["error","savedobjects-service"],"pid":8167,"message":"[.kibana] migration failed, dumping execution log:"}

I can't just copy and paste the log as I have verbose logs enabled.

Thanks @paulbrown4 let me see if someone from our migrations team can help? cc: @rudolf

@paulbrown4 You might be running into 7.12.0 upgrade migrations fail with timeout_exception or receive_timeout_transport_exception · Issue #95321 · elastic/kibana · GitHub

Can you share the output of:

POST .kibana/_search?filter_path=aggregations
{
  "aggs": {
    "saved_object_type": {
      "terms": {"field": "type"}
    }
  }
}

And:

POST .kibana_task_manager/_search?filter_path=aggregations
{
  "aggs": {
    "saved_object_type": {
      "terms": {"field": "type"}
    }
  }
}

(since your Kibana is probably offline you'll have to use a REST client or curl to run these queries)

.kibana/_search?filter_path=aggregations

{
  "aggregations" : {
    "saved_object_type" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 278,
      "buckets" : [
        {
          "key" : "fleet-agent-events",
          "doc_count" : 1555626
        },
        {
          "key" : "visualization",
          "doc_count" : 1528
        },
        {
          "key" : "application_usage_daily",
          "doc_count" : 465
        },
        {
          "key" : "alert",
          "doc_count" : 326
        },
        {
          "key" : "action_task_params",
          "doc_count" : 301
        },
        {
          "key" : "lens-ui-telemetry",
          "doc_count" : 273
        },
        {
          "key" : "dashboard",
          "doc_count" : 220
        },
        {
          "key" : "siem-detection-engine-rule-status",
          "doc_count" : 219
        },
        {
          "key" : "ui-metric",
          "doc_count" : 201
        },
        {
          "key" : "search",
          "doc_count" : 166
        }
      ]
    }
  }
}

.kibana_task_manager/_search?filter_path=aggregations

{
  "aggregations" : {
    "saved_object_type" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "task",
          "doc_count" : 312
        }
      ]
    }
  }
}

The 1.5 million fleet-agent-events is indeed the cause of the failures. Please follow " Workaround for large fleet-agent-events" in 7.12.0 upgrade migrations fail with timeout_exception or receive_timeout_transport_exception · Issue #95321 · elastic/kibana · GitHub

@rudolf, that got me on the right track. Thank you.

Just as an FYI I had to do a couple of other things also.

First, was that I was getting this message:

{"type":"log","@timestamp":"2021-04-07T08:07:44-05:00","tags":["fatal","root"],"pid":1145,"message":"Error: Document \"tsvb-validation-telemetry\" has property \"tsvb-validation-telemetry\" which belongs to a more recent version of Kibana [7.10.0]. The last known version is [undefined]\n    at /usr/share/kibana/src/core/server/saved_objects/migrations/core/document_migrator.js:600:27\n    at Array.find (<anonymous>)\n    at nextUnmigratedProp (/usr/share/kibana/src/core/server/saved_objects/migrations/core/document_migrator.js:591:21)\n    at applyMigrations (/usr/share/kibana/src/core/server/saved_objects/migrations/core/document_migrator.js:337:18)\n    at DocumentMigrator.transformAndValidate [as transformDoc] (/usr/share/kibana/src/core/server/saved_objects/migrations/core/document_migrator.js:290:22)\n    at /usr/share/kibana/src/core/server/saved_objects/migrations/core/document_migrator.js:98:16\n    at Immediate.<anonymous> (/usr/share/kibana/src/core/server/saved_objects/migrations/core/migrate_raw_docs.js:88:17)\n    at processImmediate (internal/timers.js:461:21) {\n  data: {\n    type: 'tsvb-validation-telemetry',\n    id: 'tsvb-validation-telemetry',\n    attributes: { failedRequests: 57 },\n    references: [],\n    migrationVersion: { 'tsvb-validation-telemetry': '7.10.0' },\n    updated_at: '2021-02-05T18:54:11.888Z'\n  },\n  isBoom: true,\n  isServer: false,\n  output: {\n    statusCode: 422,\n    payload: {\n      statusCode: 422,\n      error: 'Unprocessable Entity',\n      message: 'Document \"tsvb-validation-telemetry\" has property \"tsvb-validation-telemetry\" which belongs to a more recent version of Kibana [7.10.0]. The last known version is [undefined]'\n    },\n    headers: {}\n  }\n}"}

However, the .kibana_x index was blocked for write so I had to do the following first:

PUT /<index>/_settings
{
  "settings": {
    "index.blocks.write": false
  }
}

Once I removed the tsvb doc I was able to start kibana. The downside though, was that all of the saved objects were gone. ( I had previously exported them, so I just re-imported them)

Exporting saved objects is not a perfect backup for all of Kibana. For instance you seem to be using alerting and this will not be included in the import/export so I would recommend you try to recover the index instead.

Migrations will not change any existing indices so your 7.11 index should still be available. Is it possible that you deleted an index or removed the .kibana alias from one of the indices? Do you have a snapshot you can restore from?

What's the output of GET _cat/aliases?

I only had a couple of alerts, but at some point during the update the alert data was inaccessible.

I ended up removing the 7.12 task manager and temp index, but that was it.

As for snapshots I did not have the chance to do that before the upgrade. This particular cluster is fairly new but is necessary to ingest production data, so I have been developing it as we go. The only thing that I will have to recreate are the spaces and to reimport the necessary objects for them, which is not a lot.