Detecting Data Loss

Jeff_Bolle · December 22, 2017, 1:43pm

Last night we lost 3 nodes in our cluster. From the logging I have it looks like they died sequentially over the course of about 2 hours. The cause of why they became unresponsive is not yet known. I was able to SSH into the machines this morning, but unable to restart the elasticsearch service (they were no longer seen as part of the cluster) or even run ps -aux.
The nodes all have ephemeral disks (Google Cloud). When I rebooted the nodes the amount of free space on each of the disks was substantially higher than before, but the disks were not blank, and I didn't stop the node, so I would have expected the disks to remain intact.

What I'd like help with is understanding how I can see if we lost data due to the conditions. I could restore the indicies from our backups, but I'd like to know if there is a more straightforward way to tell if shards / segments were deleted and data was lost (without having to remember how many docs I should have in each of my indicies).

warkolm · December 23, 2017, 7:25am

The easiest way would be to have a X-Pack basic license with Monitoring to a secondary cluster, then you could just look at the stats. Without that, you are flying blind.

Jeff_Bolle · December 24, 2017, 2:45am

I'm saving logs off the box and aggregating them. What log line am I looking for?

warkolm · December 24, 2017, 5:14am

We don't log this sort of thing.

Jeff_Bolle · December 24, 2017, 1:40pm

What about if, during startup, a shard that was previously on the node is missing or corrupted? Any sort of indication that there were shenanigans at the FS level beneath elastic?

warkolm · December 24, 2017, 8:30pm

Then it will log that.

system · January 21, 2018, 8:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Am I losing data? Elasticsearch	5	330	April 27, 2022
Data node went offline - data loss? Elasticsearch	2	687	January 3, 2020
Data loss incase of elasticsearch data node going down Elasticsearch	3	386	October 24, 2020
Cluster partition resulted in loss of data Elasticsearch	5	426	July 6, 2017
Lost the data Elasticsearch	7	398	July 6, 2017

Detecting Data Loss

Related topics