Upgrade elasticsearch from version 7.16 to 7.17

Hi all,

I recently upgraded my Elasticsearch cluster from version 7.16 to 7.17 and encountered several issues. Kibana is not loading properly, and I am receiving license-related errors and missing authentication credentials. I manually closed some system indices like .kibana_task_manager and .kibana-event-log-* as a troubleshooting step. Here are some of the errors I’ve observed in the logs:

  • License errors preventing access to Kibana and certain API endpoints.

    {
      "type": "log",
      "@timestamp": "2025-01-29T15:45:55+03:30",
      "tags": [
        "error",
        "plugins",
        "security",
        "authentication"
      ],
      "pid": 15802,
      "message": "License is not available, authentication is not possible."
    }
    
  • Missing authentication credentials for some requests.

    {
      "type": "log",
      "@timestamp": "2025-01-29T15:45:55+03:30",
      "tags": [
        "warning",
        "plugins",
        "securitySolution"
      ],
      "pid": 15802,
      "message": "Unable to verify endpoint policies in line with license change: failed to fetch package policies: missing authentication credentials for REST request [/.kibana_7.17.0_7.17.0/_search?from=0&rest_total_hits_as_int=true&size=100]: security_exception: [security_exception] Reason: missing authentication credentials for REST request [/.kibana_7.17.0_7.17.0/_search?from=0&rest_total_hits_as_int=true&size=100]"
    }
    
  • Random 503 errors and cluster health issues.

    {
      "type": "response",
      "@timestamp": "2025-01-29T15:45:55+03:30",
      "tags": [],
      "pid": 15802,
      "method": "post",
      "statusCode": 503,
      "req": {
        "url": "/api/monitoring/v1/clusters/kVfChQJ8TWWjr5Mrbw_nEg",
        "method": "post",
        "headers": {
          "x-real-ip": "192.168.7.178",
          "x-forwarded-for": "192.168.7.178",
          "host": "logsummerize.sw.shatel.com",
          "connection": "upgrade",
          "content-length": "101",
          "sec-ch-ua-platform": "\"Windows\"",
          "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
          "sec-ch-ua": "\"Google Chrome\";v=\"131\", \"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\"",
          "content-type": "application/json",
          "kbn-version": "7.17.0",
          "sec-ch-ua-mobile": "?0",
          "accept": "*/*",
          "origin": "https://logsummerize.sw.shatel.com",
          "sec-fetch-site": "same-origin",
          "sec-fetch-mode": "cors",
          "sec-fetch-dest": "empty",
          "referer": "https://logsummerize.sw.shatel.com/app/monitoring",
          "accept-encoding": "gzip, deflate, br, zstd",
          "accept-language": "en-US,en;q=0.9,fa;q=0.8"
        },
        "remoteAddress": "172.22.129.189",
        "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
        "referer": "https://logsummerize.sw.shatel.com/app/monitoring"
      },
      "res": {
        "statusCode": 503,
        "responseTime": 129,
        "contentLength": 86
      },
      "message": "POST /api/monitoring/v1/clusters/kVfChQJ8TWWjr5Mrbw_nEg 503 129ms - 86.0B"
    }
    

Has anyone faced similar issues after upgrading or has any suggestions on how to resolve these errors, particularly related to the system indices and license validation?

I appreciate any help or insights!

All those suggests issues in your Elasticsearch cluster, you need to check Elasticsearch logs to troubleshoot.

How many nodes do you have? How did you perform the upgrade?

Please check the logs fo all nodes for any hint.

My cluster has 4 nodes. To upgrade from version 7.16 (with nodes running on CentOS 7), I removed one node and joined a new node with all the new specifications, including IP, hostname, etc., running on Debian 12. I continued this process until the last node, and finally, the cluster version was upgraded to 7.17.0. I used the old cluster certificates and did not create new certificates before joining the new nodes, nor did I change any user passwords. In the end, I replaced Kibana with the new node and entered the license into the new Kibana, and all steps were completed successfully.

Did you check the logs for your nodes? What is the status of your cluster?

How did you remove the nodes? The correct approach is to exlude the node from allocation so it can send the data to other nodes, and then when it is empty you remove.

Or did you use the same data disk for the new nodes and just replaces the SO with the installation?

You need to check the logs and share them.

There are no logs related to this issue on the cluster nodes. The only log I found related to a failed process in the cluster was the same log I sent you. The actions I have taken to resolve this issue are as follows:

  1. Increased the JVM heap on the cluster nodes.
  2. Enabled xpack.monitoring and xpack.reporting.
  3. Closed all .kibana indices related to version 7.16, which did not solve the issue, so I reopened them.
  4. Added a new Kibana instance with version 7.17.27, which also had no effect.

If you need any further information, please let me know, and I will provide it.

In fact, on each node, I would stop the Elasticsearch service, and when the cluster status changed to yellow and all the shards of that node were placed in unassigned shards, I would start the Elasticsearch service on the new node to join it to the cluster and take the unassigned shards.

Regarding the second question, I should mention that the new nodes were installed on new and independent machines.

You didn't share any elasticsearch logs, without elasticsearch logs from the same time frame you had issues in Kibana, is not possible to help with any troubleshoot.

When you have these isues in Kibana you need to check the log of the Elasticsearch nodes, or at least the ones that are configured in Kibana and share them.

Kibana is on the same version as Elasticsearch, right? When you were upgrading your cluster did you stop the old Kibana version before upgrading it?

configured in Kibana:
server.port: 5601
server.host: 0.0.0.0
server.name: "kibana"
elasticsearch.hosts: ["http://elastic1.software.shatel:9200","http://elastic2.software.shatel:9200","http://elastic101.software.shatel:9200","http://elastic102.software.shatel"]
elasticsearch.username: "elastic"
elasticsearch.password: "XXX"
kibana.autocompleteTimeout: 10000
kibana.autocompleteTerminateAfter: 10000000
xpack.reporting.encryptionKey: "iKv8dYG+5z3FJoBSJtfqyL68Aqt5uaSR/AYmP8oyrgY="
xpack.encryptedSavedObjects.encryptionKey: "iKv8dYG+5z3FJoBSJtfqyL68Aqt5uaSR/AYmP8oyrgY="
xpack.security.encryptionKey: "lCtN/1mlmT+a5tkMYicW2cHPHTykUYbIxpOo4tL0utU="
xpack.monitoring.kibana.collection.enabled: true
map.emsUrl: https://elasticmap.sw.shatel.com
server.publicBaseUrl: http://kibana.software.shatel:5601

Currently, the version of Kibana is 7.17.27, but at the beginning of the migration, the version was 7.17.0, which is the same version as the cluster.
Regarding your last question, I should mention that Kibana was the last component to be upgraded to version 7.17.0 because until the last moment, the cluster was on version 7.17.0, while Kibana was on version 7.16.2.

Is Elasticsearch also on 7.17.27? Kibana version should be the same as Elasticsearch, while it works when the patch version is different, you need to use the same version.

Also, as mentioned, it is not possible to know what may be the issue without some evidence from Elasticsearch logs.

Do you have a paid license or is the basic license? If you have a platinum or enterprise I suggest that you open a support ticket.

This is a bit difficult to believe. There's 8 nodes here, the 4 old ones and the 4 new ones.

At what point in your upgrade process did you start to encounter issues?

My understanding of your upgrade process was that you started with 4x cluster nodes and a kibana node, all on CentOS, and all running 7.16

You removed es-node1 from the cluster
  Cluster went yellow and a bunch of shards were unassigned
You added a brand new, empty mode, also called es-node1, on same IP
 This node was now running 7.17 on Debian
 This must have included copying some settings, certificates, etc.
That node successfully joined the cluster
 Shards were allocated to that node
You waited til everything was finished shuffling and cluster status was green

and repeated that for nodes 2,3, and 4.

Then you did same with the kibana node, except there was no shards/data to copy.

what does _cluster/health tell you right now ?

"Yes, I went through exactly the same steps with the difference that the new node I was replacing had a different hostname and IP. Currently, the cluster is green, and I haven’t lost any shards. It has been running in the production environment for almost two weeks now."
"But I still see this 503 error in Kibana, and it appears randomly."

Thanks for confirming.

So my understanding now is that you actually felt you had completely successfully upgraded your cluster onto the new hardware/OS/ES-version, but then some other issues have appeared.

If all ES nodes agree on the cluster composition and health, then I would agree and presume you have successfully upgraded / migrated your cluster. If kibana "almost always" works, thats probably fine too, in sense of the upgrade.

Just to be sure, check /_cluster/health and /cat/_nodes on all nodes in the cluster. And please share output here. Please also double check the old nodes are either offline completely, or certainly not running any ES or kibana processes any more.

If thats all fine, then you are only left with intermittent issues to resolve. For that, we'd need know more about the pattern of the 503s, any logs around that time, etc.

shows what 503 means. I doubt ES responds to a valid request with a 503 without logging anything at all to indicate why it would be unable to respond. btw, is the 503 immediate or does it take a little while?