Upgrade to 8.1.2 failing with unexpected folder encountered during data folder upgrade

parthmaniar · April 20, 2022, 5:31am

Hello,

I upgraded 1 of the 2 data nodes in my 3 node cluster (3rd node is voting only and does not store any data) from 7.17.2 to 8.1.2. Post the upgrade, Elasticsearch service is crashing with the following error:

Exception
java.lang.IllegalStateException: unexpected folder encountered during data folder upgrade: /mnt/ssd1/var/lib/elasticsearch/nodes/0/_state_30-05-2021

Few more lines from the logfile

[2022-04-20T04:48:42,732][INFO ][o.e.e.NodeEnvironment    ] [secondarynode] oldest index version recorded in NodeMetadata 7.8.1
[2022-04-20T04:48:42,733][ERROR][o.e.b.Bootstrap          ] [secondarynode] Exception
java.lang.IllegalStateException: unexpected folder encountered during data folder upgrade: /mnt/ssd1/var/lib/elasticsearch/nodes/0/_state_30-05-2021
        at org.elasticsearch.env.NodeEnvironment.upgradeLegacyNodeFolders(NodeEnvironment.java:431) ~[elasticsearch-8.1.2.jar:8.1.2]

How do I resolve this error?

DavidTurner · April 20, 2022, 7:39am

This is not a folder that Elasticsearch would create so it looks like someone or something else has been meddling with the contents of the data path. This is very strongly not recommended and can lead to all sorts of problems.

I would recommend restoring the cluster from a snapshot into a clean data path.

parthmaniar · April 20, 2022, 8:58am

Thank you very much @DavidTurner. It is odd since the host is exclusively an ES node. Further the data of ES is written to another SSD from the OS, meaning the /mnt/ should have no other data written even from the OS.

I do reckon this has something to do with last years disk failure that you helpmed me diagnose in this thread (the month of the thread matches - Node sync fails and cluster goes to "red" - #21 by parthmaniar).

I will need to replace the SSD meaning the new one will have no data (but the Elasticsearch settings and the OS will remain as is). Hence, is following feasable for recovery?:

Attach a new SSD to the VM with the same mounth path as the previous one.
Upgrade last remaining ES data node to 8.1.2. This node has all of the data.
I have turned off routing of data:

{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

I can eable routing of data which would fill up the new disk? Will this work?

The reason I am stuck is:

One data & master node is on 7.17.2 and it has all of the data intact.
One data & master node is on 8.1.2 but has the data disk (data folder for ES) that has failed
Third is a voting only node that has upgraded successfully.

Thank you very much.

DavidTurner · April 20, 2022, 9:04am

Yes, if your cluster health is yellow then you can simply replace this node with a new (empty) 8.1.2 one and let Elasticsearch rebuild its contents. I recommend waiting until the health is green before upgrading the final node.

parthmaniar · April 20, 2022, 10:00am

This is where I am confused.

My current cluster status:

primarynode with node.roles: [ data, master ] is fully operational running 7.17.2
secondarynode with node.roles: [ data, master ] has storage failure and has been upgraded to 8.1.2
votingonlynode with node.roles: [ master, voting_only ] is upgraded to 8.1.2 & ES service is running

I am unable to query the nodes (API calls via Postman are giving the following output):

{
    "error": {
        "root_cause": [
            {
                "type": "security_exception",
                "reason": "unable to authenticate user [elastic] for REST request [/_cluster/health/]",
                "header": {
                    "WWW-Authenticate": [
                        "Bearer realm=\"security\"",
                        "ApiKey",
                        "Basic realm=\"security\" charset=\"UTF-8\""
                    ]
                }
            }
        ],

I am not sure why, but I get these when 1 of the two master node goes down.
maybe it is because of
discovery.seed_hosts: ["primarynode", "secondarynode", "votingonlynode"]

Any guidance here?

DavidTurner · April 20, 2022, 10:21am

Anything in the logs? Assuming your credentials are correct I guess this means the cluster health is not yellow. I suggest downgrading secondarynode back to 7.17.2 until you work out what's going on here. Downgrades typically don't work but it should be ok here since unexpected folder encountered during data folder upgrade happens so early in startup.

parthmaniar · April 20, 2022, 10:31am

Here is the current status (I think the initial one was because it was searching for the secondary node)

{

    "cluster_name": "data_analytics_1",
    "status": "red",
    "timed_out": false,
    "number_of_nodes": 2,
    "number_of_data_nodes": 1,
    "active_primary_shards": 1177,
    "active_shards": 1177,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 876,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 57.3307355090112

}

parthmaniar · April 21, 2022, 11:53am

I've got a new SSD and the VM is up and running. Secondarynode has Elasticsearch 8.1.2 running (I got a prompt for 8.1.3 - I reckon Elastic needs to come up with an update release cycle )

"cluster_name": "data_analytics_1",
    "status": "red",
    "timed_out": false,
    "number_of_nodes": 3,
    "number_of_data_nodes": 2,
    "active_primary_shards": 1189,
    "active_shards": 1189,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 1195,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 49.874161073825505
}

The status seems to be red since I have no enabled routing of shards.

Should I upgrade the last node to 8.1.3 - this is the only node with data right now an enable the routing of shards to sync primary (running 7.17.2) and secondary nodes? or enable sync before the upgrade so that there are two copies of the data?

DavidTurner · April 21, 2022, 12:22pm

I think you should have downgraded as per my previous message, at least until you work out what's going on. The cluster is in red health now so it seems you've lost some primary shards.

parthmaniar · April 21, 2022, 12:36pm

Thanks David. I will start the rebuild process. I have taken snapshots of the data including full backup of the VMs before attempting quick and of course ignorant fix. Sorry for that and thank you very much.

system · May 19, 2022, 12:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node.max_local_storage_nodes was set to 3, now can't upgrade to 8.0 Elasticsearch	8	1972	May 17, 2022
Getting fatal exception while booting Elasticsearch while trying to start elasticsearch in Vm Elasticsearch migration	4	499	February 14, 2024
Upgrade from 7.17.1 to 8.5.3 failed Elasticsearch	9	800	January 21, 2023
IllegalStateException Elasticsearch	5	4500	July 5, 2017
Default.path.data issue when path.data is changed in elasticsearch Elasticsearch	14	5082	December 26, 2017

Upgrade to 8.1.2 failing with unexpected folder encountered during data folder upgrade

Related topics