Hi
I have a two-node cluster with IP "0.0.0.1" , "0.0.0.2". One of my VMs "0.0.0.2" suddenly stopped, and when I start it, the cluster health was RED. Then I restart both VMs again and below message has been found in their log and I could not login in https://0.0.0.1:9200 and https://0.0.0.2:9200.
[2022-10-08T08:40:07,003][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-1] collector [cluster_stats] failed to collect data
org.elasticsearch.action.UnavailableShardsException: at least one primary shard for the index [.security-7] is unavailable
at org.elasticsearch.xpack.security.support.SecurityIndexManager.getUnavailableReason(SecurityIndexManager.java:147) ~[?:?]
at org.elasticsearch.xpack.security.authc.esnative.NativeUsersStore.getUserCount(NativeUsersStore.java:167) ~[?:?]
at org.elasticsearch.xpack.security.authc.esnative.NativeRealm.lambda$usageStats$1(NativeRealm.java:56) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.16.1.jar:7.16.1]
at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.lambda$usageStats$5(CachingUsernamePasswordRealm.java:249) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.16.1.jar:7.16.1]
at org.elasticsearch.xpack.core.security.authc.Realm.usageStats(Realm.java:140) ~[?:?]
at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.usageStats(CachingUsernamePasswordRealm.java:247) ~[?:?]
at org.elasticsearch.xpack.security.authc.esnative.NativeRealm.usageStats(NativeRealm.java:56) ~[?:?]
at org.elasticsearch.xpack.security.authc.Realms.usageStats(Realms.java:388) ~[?:?]
at org.elasticsearch.xpack.security.SecurityFeatureSet.usage(SecurityFeatureSet.java:165) ~[?:?]
at org.elasticsearch.xpack.core.action.TransportXPackUsageAction.lambda$masterOperation$2(TransportXPackUsageAction.java:86) ~[?:?]
at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:135) ~[?:?]
at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) ~[elasticsearch-7.16.1.jar:7.16.1]
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-7.16.1.jar:7.16.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) ~[elasticsearch-7.16.1.jar:7.16.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.16.1.jar:7.16.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Then based on this link I did below steps to resolve issue:
1- define a new user
elasticsearch-users useradd restore_user -p xxxxxxx -r superuser
2- delete corrupt index:
curl -u restore_user -k -X DELETE "https://localhost:9200/.security-*"
3- restart all nodes
when I did these steps, I was able to login to elasticsearch node which I defined new user, by new user. but all previous roles and users have been vanished and I had to define them manauly agarin.
How can I handle this issue without the need of defining users and roles again?
also, the cluster health is RED and there are two unassigned shard in kibana monitoring but in the indices part, the status of all indices are green.
Regards