Upgraded first node from 7.17.5 to 8.4. Won't start, claiming there is a 6.4.5 index, but I can't find it

Hi Everyone, I have just started upgrading my cluster from 7.17.5 to 8.4. I completed the first node, but when I went to start the service, it complained that:

[2022-10-31T17:08:08,683][ERROR][o.e.b.Elasticsearch      ] [hostname] fatal exception while booting Elasticsearch
java.lang.IllegalStateException: cannot upgrade node because incompatible indices created with version [6.5.4] exist, while the minimum compatible index version is [7.0.0]. Upgrade your older indices by reindexing them in version [7.17.0] first.
	at org.elasticsearch.env.NodeEnvironment.checkForIndexCompatibility(NodeEnvironment.java:529) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.env.NodeEnvironment.upgradeLegacyNodeFolders(NodeEnvironment.java:408) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:301) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.node.Node.<init>(Node.java:456) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.node.Node.<init>(Node.java:311) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:214) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67) ~[elasticsearch-8.4.3.jar:?]

Before I started the upgrade, I ran through the upgrade advisor, and upgraded all the indexes it mentioned. The upgrade advisor is now clear, and I can't for the life of me work out which index(s) are causing this issue.

Running https://localhost:9200/.*/_settings?human shows that each of the indicies has a created_string value that starts with a 7.

Am I missing something here??? I'd be super appreciative of any suggestions on how to identify these ghost indicies that are preventing the node from starting!

This cluster has been through numerous upgrades over the last 5 years, and all nodes are running Ubuntu 20.04 (just upgraded). Let me know if there is any other info I can share to help with troubleshooting.

Welcome to our community! :smiley:

Can you share that output?
Do you have the output from the _cluster/stats?pretty&human API?

Surely can!

{
    "_nodes": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "cluster_name": "dtta-es01",
    "cluster_uuid": "I5niE6clQ9eafuQB-mw7xQ",
    "timestamp": 1667269356202,
    "status": "yellow",
    "indices": {
        "count": 137,
        "shards": {
            "total": 277,
            "primaries": 141,
            "replication": 0.9645390070921985,
            "index": {
                "shards": {
                    "min": 1,
                    "max": 10,
                    "avg": 2.021897810218978
                },
                "primaries": {
                    "min": 1,
                    "max": 5,
                    "avg": 1.0291970802919708
                },
                "replication": {
                    "min": 0.0,
                    "max": 1.0,
                    "avg": 0.9635036496350365
                }
            }
        },
        "docs": {
            "count": 1336162711,
            "deleted": 99924
        },
        "store": {
            "size": "1.1tb",
            "size_in_bytes": 1283453397808,
            "total_data_set_size": "1.1tb",
            "total_data_set_size_in_bytes": 1283453397808,
            "reserved": "0b",
            "reserved_in_bytes": 0
        },
        "fielddata": {
            "memory_size": "0b",
            "memory_size_in_bytes": 0,
            "evictions": 0
        },
        "query_cache": {
            "memory_size": "19.7kb",
            "memory_size_in_bytes": 20256,
            "total_count": 3482032,
            "hit_count": 113411,
            "miss_count": 3368621,
            "cache_size": 4,
            "cache_count": 5363,
            "evictions": 5359
        },
        "completion": {
            "size": "0b",
            "size_in_bytes": 0
        },
        "segments": {
            "count": 3720,
            "memory": "63.8mb",
            "memory_in_bytes": 66940264,
            "terms_memory": "36.4mb",
            "terms_memory_in_bytes": 38172184,
            "stored_fields_memory": "5.5mb",
            "stored_fields_memory_in_bytes": 5864240,
            "term_vectors_memory": "0b",
            "term_vectors_memory_in_bytes": 0,
            "norms_memory": "1.9mb",
            "norms_memory_in_bytes": 2041536,
            "points_memory": "0b",
            "points_memory_in_bytes": 0,
            "doc_values_memory": "19.8mb",
            "doc_values_memory_in_bytes": 20862304,
            "index_writer_memory": "3.4mb",
            "index_writer_memory_in_bytes": 3588368,
            "version_map_memory": "0b",
            "version_map_memory_in_bytes": 0,
            "fixed_bit_set": "215.3mb",
            "fixed_bit_set_memory_in_bytes": 225820816,
            "max_unsafe_auto_id_timestamp": 1667181538156,
            "file_sizes": {}
        },
        "mappings": {
            "field_types": [
                {
                    "name": "alias",
                    "count": 91,
                    "index_count": 30,
                    "script_count": 0
                },
                {
                    "name": "boolean",
                    "count": 1508,
                    "index_count": 44,
                    "script_count": 0
                },
                {
                    "name": "byte",
                    "count": 17,
                    "index_count": 17,
                    "script_count": 0
                },
                {
                    "name": "constant_keyword",
                    "count": 132,
                    "index_count": 44,
                    "script_count": 0
                },
                {
                    "name": "date",
                    "count": 2495,
                    "index_count": 110,
                    "script_count": 0
                },
                {
                    "name": "double",
                    "count": 5291,
                    "index_count": 22,
                    "script_count": 0
                },
                {
                    "name": "flattened",
                    "count": 300,
                    "index_count": 25,
                    "script_count": 0
                },
                {
                    "name": "float",
                    "count": 4545,
                    "index_count": 46,
                    "script_count": 0
                },
                {
                    "name": "geo_point",
                    "count": 260,
                    "index_count": 30,
                    "script_count": 0
                },
                {
                    "name": "integer",
                    "count": 15,
                    "index_count": 1,
                    "script_count": 0
                },
                {
                    "name": "ip",
                    "count": 613,
                    "index_count": 49,
                    "script_count": 0
                },
                {
                    "name": "keyword",
                    "count": 38078,
                    "index_count": 110,
                    "script_count": 0
                },
                {
                    "name": "long",
                    "count": 51749,
                    "index_count": 91,
                    "script_count": 0
                },
                {
                    "name": "match_only_text",
                    "count": 1495,
                    "index_count": 23,
                    "script_count": 0
                },
                {
                    "name": "nested",
                    "count": 335,
                    "index_count": 35,
                    "script_count": 0
                },
                {
                    "name": "object",
                    "count": 56777,
                    "index_count": 74,
                    "script_count": 0
                },
                {
                    "name": "scaled_float",
                    "count": 2976,
                    "index_count": 30,
                    "script_count": 0
                },
                {
                    "name": "text",
                    "count": 1060,
                    "index_count": 110,
                    "script_count": 0
                },
                {
                    "name": "version",
                    "count": 6,
                    "index_count": 6,
                    "script_count": 0
                },
                {
                    "name": "wildcard",
                    "count": 425,
                    "index_count": 25,
                    "script_count": 0
                }
            ],
            "runtime_field_types": []
        },
        "analysis": {
            "char_filter_types": [],
            "tokenizer_types": [],
            "filter_types": [],
            "analyzer_types": [],
            "built_in_char_filters": [],
            "built_in_tokenizers": [],
            "built_in_filters": [],
            "built_in_analyzers": []
        },
        "versions": [
            {
                "version": "7.1.1",
                "index_count": 2,
                "primary_shard_count": 6,
                "total_primary_size": "607.9kb",
                "total_primary_bytes": 622499
            },
            {
                "version": "7.3.0",
                "index_count": 1,
                "primary_shard_count": 1,
                "total_primary_size": "675.5kb",
                "total_primary_bytes": 691792
            },
            {
                "version": "7.4.0",
                "index_count": 4,
                "primary_shard_count": 4,
                "total_primary_size": "810.5kb",
                "total_primary_bytes": 830033
            },
            {
                "version": "7.6.1",
                "index_count": 2,
                "primary_shard_count": 2,
                "total_primary_size": "846.5kb",
                "total_primary_bytes": 866918
            },
            {
                "version": "7.7.0",
                "index_count": 4,
                "primary_shard_count": 4,
                "total_primary_size": "786kb",
                "total_primary_bytes": 804865
            },
            {
                "version": "7.13.2",
                "index_count": 2,
                "primary_shard_count": 2,
                "total_primary_size": "2.6mb",
                "total_primary_bytes": 2813698
            },
            {
                "version": "7.16.1",
                "index_count": 90,
                "primary_shard_count": 90,
                "total_primary_size": "543.2gb",
                "total_primary_bytes": 583295152244
            },
            {
                "version": "7.17.6",
                "index_count": 13,
                "primary_shard_count": 13,
                "total_primary_size": "17.7gb",
                "total_primary_bytes": 19005832429
            },
            {
                "version": "7.17.7",
                "index_count": 19,
                "primary_shard_count": 19,
                "total_primary_size": "36.9gb",
                "total_primary_bytes": 39639376900
            }
        ]
    },
    "nodes": {
        "count": {
            "total": 5,
            "coordinating_only": 0,
            "data": 3,
            "data_cold": 0,
            "data_content": 0,
            "data_frozen": 0,
            "data_hot": 0,
            "data_warm": 0,
            "ingest": 3,
            "master": 2,
            "ml": 0,
            "remote_cluster_client": 0,
            "transform": 0,
            "voting_only": 0
        },
        "versions": [
            "7.17.7"
        ],
        "os": {
            "available_processors": 32,
            "allocated_processors": 32,
            "names": [
                {
                    "name": "Linux",
                    "count": 5
                }
            ],
            "pretty_names": [
                {
                    "pretty_name": "Ubuntu 20.04.5 LTS",
                    "count": 4
                },
                {
                    "pretty_name": "Ubuntu 16.04.6 LTS",
                    "count": 1
                }
            ],
            "architectures": [
                {
                    "arch": "amd64",
                    "count": 5
                }
            ],
            "mem": {
                "total": "50.6gb",
                "total_in_bytes": 54402265088,
                "free": "1.5gb",
                "free_in_bytes": 1612709888,
                "used": "49.1gb",
                "used_in_bytes": 52789555200,
                "free_percent": 3,
                "used_percent": 97
            }
        },
        "process": {
            "cpu": {
                "percent": 4
            },
            "open_file_descriptors": {
                "min": 490,
                "max": 1324,
                "avg": 994
            }
        },
        "jvm": {
            "max_uptime": "4.1d",
            "max_uptime_in_millis": 355478809,
            "versions": [
                {
                    "version": "19",
                    "vm_name": "OpenJDK 64-Bit Server VM",
                    "vm_version": "19+36-2238",
                    "vm_vendor": "Oracle Corporation",
                    "bundled_jdk": true,
                    "using_bundled_jdk": true,
                    "count": 5
                }
            ],
            "mem": {
                "heap_used": "11.7gb",
                "heap_used_in_bytes": 12638348888,
                "heap_max": "26gb",
                "heap_max_in_bytes": 27917287424
            },
            "threads": 398
        },
        "fs": {
            "total": "2.3tb",
            "total_in_bytes": 2629944565760,
            "free": "1.2tb",
            "free_in_bytes": 1319595266048,
            "available": "1.1tb",
            "available_in_bytes": 1210035482624
        },
        "plugins": [],
        "network_types": {
            "transport_types": {
                "security4": 5
            },
            "http_types": {
                "security4": 5
            }
        },
        "discovery_types": {
            "zen": 5
        },
        "packaging_types": [
            {
                "flavor": "default",
                "type": "deb",
                "count": 5
            }
        ],
        "ingest": {
            "number_of_pipelines": 8,
            "processor_stats": {
                "community_id": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "conditional": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "convert": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "foreach": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "gsub": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "registered_domain": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "remove": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "rename": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "script": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "set": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                }
            }
        }
    }
}

All versions are 7+ from what I can see.

Not sure if it's related or not, but I noticed today when grasping at straws that some of the Index Lifecycle Policies have a count of attached indices, but when I click on that number to see them, it returns an empty list.

Click on the 19, and I get this:

Again, I am not sure if that's just a red herring, or if it's somehow related...

That is odd! Let me ask if anyone might be able to provide other assistance.

You should upgrade your hosts to be on the same OS btw :wink:

Thanks mate, much appreciated!

Yeah, I missed one... :man_facepalming:

I've run into this exact same issue. Upgrading from 7.17.0 to 8.4.3, upgrade advisor shows no remaining issues. https://localhost:9200/.*/_settings?human shows all system indices are created with 7.x and I cannot for the life of me find which index is created with 6.x

It would be really helpful if the fatal error message "cannot upgrade node because incompatible indices created with version xxx exist" had the name of the incompatible indices.

edit to add: I have three clusters to upgrade that have had roughly the same lifecycle (test/stg/prod) and I ran into this while doing the stg cluster upgrade. I did not run into it on the test cluster despite them being configured almost exactly the same

1 Like

It looks like maybe the node metadata is out of date or something? elasticsearch/NodeEnvironment.java at 8.4 · elastic/elasticsearch · GitHub

Is there a command to refresh that? Not having the easiest time sorting through where that metadata.oldestIndexVersion() gets set

I downgraded the stuck node back to 7.17 and let it rejoin the cluster and somehow that seems to have refreshed the oldest_index_version metadata and now it worked when I tried again to upgrade to 8.

Thanks for the tip, Emily! I'll give that a try with this node as well and see how it goes.

100% agree that the Error message would be more useful if it listed the problematic indices.

Just to confirm that @Emily_Nicholson is right here. This can happen if you delete the last 6.x index and then immediately shut down a node to start the upgrade, because index deletes are a little asynchronous so the node may not have received the message about the last bad index being deleted before it's shut down.

Temporarily reverting the node back to 7.x and allowing it to rejoin the cluster will refresh the stale metadata. Downgrades don't work in general but in this specific situation it's ok because the 8.x node won't have changed anything on disk by this point.

Unfortunately the 8.x node doesn't really have a way to provide this information - the problem is that the bad indices are too old for an 8.x node to even read their metadata which would be required to name them. However I'm not sure it really would help to know those names - you can't do anything different with this information.

That said, I see room for improvement in the error message and opened this PR:

2 Likes

Thanks for the update @DavidTurner , that's great insight.

I think the enhancements in the PR are a good update.

1 Like