Elastic won't start after upgrade to 8.19.4

We recently attempted to upgrade our cluster from 8.17.4 to 8.19.4 using apt. After the upgrade, the service won’t start and we see this line in the logs:

java.lang.IllegalStateException: Failed to parse mappings for index [[.kibana-observability-ai-assistant-kb-000001/UgwoSBEkQPqRj-ItbDSGdA]]

and:

Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to parse mapping: [semantic_text] is available on indices created with 8.11 or higher.

Are we toast?

Well, I would hope not. But did you create snapshot before the upgrade?

Can you explain a bit more on current state - the cluster size/topology, how many nodes were upgraded before you saw the error, how (precisely) did you upgrade?

We have a two-node cluster running on bare metal. So no snapshots. We issued apt upgrade on each node. Edit: I should add that we don’t mind partial data loss. I tried to find the index containing the problematic data, but couldn’t locate it anywhere in the cluster.

2-nodes is usually not ideal. I know it's no help now, but FYI you can make snapshots from bare metal too, e.g. to a NFS share. You can at least, now, make a full copy of your data directory, so you can try a few different things on a test system to recover.

Are all your indices with at least 1 replica?

Did the upgrade complete on both nodes, or just one?

given it gives a specific index with specific uuid, UgwoSBEkQPqRj-ItbDSGdA , it is (or was) surely there at some point? Should be a directory of that name under the indices subdirectory of your data directory on one or both of the Elastic nodes.

btw, I thought the Kibana AI Assistant stuff needed a non-basic license, so if you have one you can/should open a support case?

Thanks for the information. I tried deleting the offending directory on both nodes. One node actually seems to start up now, but the cluster doesn’t form because quorum hasn’t been reached. I have replicas of all indices. The upgrade completed on both nodes. I don’t have the AI Assistant thing - I don’t know why that is seen here.

Is there a way to remove the data on the node that’s complaining about the mapping, join it to the cluster ‘empty’ and replicate data?

My understanding is all your data was replicated, and you have one "good" node that starts up, but can't get fully up as it cannot reach quorum (2), as the other node does not start. The good node, I guess, is just looping waiting for the second node to join?

Can you share the actual startup logs from both the "good" and "bad" node please, and the elasticsearch.yml file from both nodes.

It looks like maybe the mapping for the .kibana-observability-ai-assistant-kb-000001 has been updated to have a semantic_text field, but since yours was originally created before 8.11 it can’t be upgraded. This sounds like a bug in the upgrade assistant to me. If possible, I’d restore to 8.17.4 from snapshots, then reindex .kibana-observability-ai-assistant-kb-000001 so that its creation version is 8.17.4. Then upgrade to 8.19.4. Or if you don’t care about that data you might be able to just delete the index.

They have no snapshots, I asked already. And they deleted the directory that corresponds to the index with "bad" mapping manually (from the filesystem) on both nodes.

Their issue now is more that they now have a 2-node cluster that won't start, as one node wont come up and the other node can't reach quorum on its own.

btw, I presume @artschooldropout has just done "apt upgrade" on both nodes, maybe a bit carelessly but not expecting this issue for sure, so doubtful they looked at the upgrade assistant prior to the upgrade.

Aside: I also noticed I also have a .kibana-observability-ai-assistant-kb-000001 and .kibana-observability-ai-assistant-conversations-000001 indices, and I also have no recollection of ever using "Kibana AI assistant". I think it's fair to say there has been some inflation (some might say bloat) in the "for free" indices that ES clusters are getting in more recent releases.

Ah, I had missed that. This looks like a bug in the upgrade process though so I’m guessing others are going to hit it and find this thread. In that case, I recommend following the steps from my post.

Thanks very much @Keith_Massey and @RainTown for your help on this. We elected to scrap the cluster and start from scratch. In the future we will use snapshots, and follow the official upgrade procedure.

thank god I see this thread. I also see these two index running 8.16.1 and I do not use this. both of this index has no documents. will my upgrade to 8.19.4 will fail?

We’ve been investigating this more. It appears right now that it is only a problem if you were on 8.10, and you upgrade (directly or indirectly I believe) to 8.18.7, 8.19.4, 9.0.7, or 9.1.4. If you started on 8.10.0 at any point, I would recommend waiting for detailed instructions before upgrading to 8.19.4. If you can’t wait, definitely make sure you have taken snapshots before upgrading. And before upgrading I would either delete that index, or reindex it.

my path is 6.x to many 7.x version to 8.1 to 8.5.3 to 8.16.1 and next step is 8.19.4

You are probably fine then. I would still highly recommend making sure your snapshots are up to date before upgrading.

1 Like

I will try as cluster is too big can’t fit all the snapshots. :grinning_face:

First of all, thanks for looking into this issue that @artschooldropout found/raised. And maybe I was unfair to say that he/she may have been "careless", if all they did was hit a specific and unknown (at the time) bug.

But, I mused above about the growth in "default indices" that get setup by default, even when you don't use the feature it corresponds too. Consider it feedback that this is quite annoying for some (long-time) users of the core product.

And, all that said, @artschooldropout - A 2-node cluster is not a great idea, it adds no real redundancy, quorum is 2, therefore its particularly fragile, ... Please use at least 3 nodes (even if 3rd one is voting-only helps!) in next cluster! And, make snapshots!!

1 Like

This has been fixed in Defer Semantic Text Failures on Pre-8.11 Indices by Mikep86 · Pull Request #135845 · elastic/elasticsearch · GitHub . That will be available in patch releases for all affected versions very soon. Thanks for reporting this @artschooldropout . I’m sorry the fix came too late for you. My understanding is that if you get into the situation where your nodes won’t boot because of the bug in this thread, then you can upgrade to the upcoming patch release and your nodes will start.

1 Like

Sometimes you eat the bear, sometimes the bear eats you. I’m just glad my experience helped prevent this from happening to others. Thanks again for everyone’s help.

3 Likes

Is there anything we can do to fix this issue currently? I moved the AI Index to another location he still complains. Seems to be written somewhere in a mapping table? Or is maybe waiting for 8.14.5 helping?

I’m not sure what you mean by “moved the AI index” but you definitely don’t want to be moving around files on the filesystem if that’s what you mean.

The best advice is probably to wait for 8.19.5, which ought to be out very soon.

If you can’t wait a day or two, or if upgrading cannot be done quickly for some reason, you can reindex the .kibana-observability-ai-assistant-kb-000001 index. For example:
First reindex .kibana-observability-ai-assistant-kb-000001 into a new index named .kibana-observability-ai-assistant-kb-000002:

POST _reindex
{
    "source": {
        "index": ".kibana-observability-ai-assistant-kb-000001"
    },
    "dest": {
        "index": ".kibana-observability-ai-assistant-kb-000002"
    }
}

Once that succeeds, delete the original index:

DELETE .kibana-observability-ai-assistant-kb-000001

Re-create the original index:

PUT .kibana-observability-ai-assistant-kb-000001

Now re-index back into the original index (this is all because there is no rename in elasticsarch):

POST _reindex
{
    "source": {
        "index": ".kibana-observability-ai-assistant-kb-000002"
    },
    "dest": {
        "index": ".kibana-observability-ai-assistant-kb-000001"
    }
}

Check that the alias is still intact:

GET .kibana-observability-ai-assistant-kb/

and

GET _cat/aliases/.kibana-observability-ai-assistant-kb?v

If anything is wrong, fix the alias:

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": ".kibana-observability-ai-assistant-kb-000001",
        "alias": ".kibana-observability-ai-assistant-kb",
        "is_write_index": true
      }
    }
  ]
}

And once everything is good, delete the extra copy of the index:

DELETE .kibana-observability-ai-assistant-kb-000002