Kibana UI is not accessible, the cluster is broken after several data folders in `/var/lib/elasticsearch/nodes/0/indices` were deleted

I encountered an issue when the Elasticsearch cluster experienced full disk usage. At that time, the only option was to delete some log folders from the path /var/lib/elasticsearch/nodes/0/indices.

Specifically, I deleted them on the hot node. After that, I restarted the Elasticsearch service, and then the elastic user could no longer be used because authentication failed, and the Kibana UI became inaccessible.

Please help me. Can the data from the other nodes be copied using scp or other tools so that it can be read by the new cluster I create?

Hello @galuh_tirta

Ideally we do not delete the data in ELK directly from the data folder & use Kibana or ILM policies to have older indices/data deleted automatically.

Now for the current situation it all depends on your environment :

  1. What was the count of data nodes in your cluster? you took action of deleting the data only on 1 of the node, right?

  2. If you have multiple data nodes & replicas was set = 1 for all the indices then elastic should be able to create the missing shards on this node.

  3. If you are taking daily snapshot for your cluster there is a possibility of restoring the indices but there will be data loss as it is daily snapshot.

  4. When you start the elasticsearch cluster what is the error message in the logs?

Thanks!!

Yes, that’s correct — I deleted some data folders on the hot nodes.

Let me explain the environment a bit:

  • Node 1 as the master node
  • Node 2 as the hot data node
  • Node 3 as the warm data node

I deleted data in /var/lib/elasticsearch/nodes/0/indices/ on Node 2 in the hope of reducing disk usage so that Elasticsearch could run normally.

Then I restarted Elasticsearch and the error message I got was:

Authentication of [elastic] was terminated by realm [reserved] – failed to authenticate user [elastic]

Unfortunately, because of storage concerns, the replica for all indices was set to 0.

If this SIEM cluster is broken, would it be possible to copy or read the data from this SIEM cluster on another SIEM cluster?

Maybe by using scp to copy the data to a new SIEM cluster, or by joining the cluster?

thanks

It's almost certainly not possible to fully recover from this. See e.g. these docs:

WARNING: Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data.

Realistically all I can recommend is to build a fresh cluster and restore any missing data from a recent snapshot taken before you deleted anything.

Is there no way to, for example, copy the .security files from the warm node to the hot node so that the elastic user can authenticate again?

I think Elasticsearch would already have done so if it could. The .security index is not stored on every node.

In fact ...

... by doing this, the .security index would not have had any copies elsewhere.