ES API that return local persisted cluster state, even before it is initialized/recovered

Is there an API that returns local persisted cluster state before the cluster state is recovered. I tried /_cluster/state?local, but it does not return the persisted state on disk, if the state is not yet initialized.

If there is no such API, I was wondering if it makes sense to add such an API. Thinking of a case where-in it makes debugging the cluster a lot easier when there is quorum loss. Instead of going through the logs, we can make this local API call and get the state. The persisted state is already parsed when the node boots up.

You're almost certainly talking about getting the state of the system if no master has been elected, because if the master has been elected but the state has not been recovered then the cluster state API gives useful responses. However the cluster state API is probably the wrong thing to rely on for investigating election issues because it doesn't include information about the discovery process, and you'd need this. It's also documented as unstable so you shouldn't rely on it much anyway.

The reason for the lack of such an API is that a sensible orchestration system should already have all the information it needs to organise a proper election. It'd only really be useful for in-depth troubleshooting, and for that kind of work it's usually more appropriate to use logs.

Agreed.
But such an API would prove helpful during quorum loss to figure out the best surviving node by comparing term and version from the cluster state output. Right now, the only way to figure out the best surviving node is using logs. With the API, I shall be able to automate this process.

I am confused. How does it help to identify the best surviving node, given that all of the surviving nodes might be stale?

The API can provide the cluster state metadata. The one with the highest term and cluster state version would be the latest, howsoever stale it might be, correct?

No, if you've lost half or more of the master-eligible nodes in the cluster then you may not have any copies of the latest cluster state left, and crucially you cannot even tell whether that's the case or not. All the remaining copies might be stale, and using them may lead to arbitrary data loss. The only safe way to proceed in those circumstances is to restore the cluster from a snapshot.

I understand that.
I am thinking of automating the unsafe bootstrap tool for my clusters. The above API will be used to decide the best surviving nodes for this tool to do a best effort recovery.

This is a very bad idea.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.