Node does not delete data from disk

lwintergerst · April 22, 2016, 8:29am

HI,

one of my data nodes does not remove data from disk after I delete an index.

Another node which holds 100% of the same data does so without a problem.

The folders in the data directory do not get removed and don't change their size after deleting an index.

Restarting the node or the server did not help at all.

EDIT1: For now I can delete data manually after deleting an index, since the node does not recognize that it still holds the data. The data is also no longer accessible
EDIT2: The logs do not say anything at all...

Luca

Crassirostris · April 22, 2016, 8:49am

+1

Exactly the same problem. After restart, all data from replicas doesn't get removed and prevents new replicas from being allocated, cluster in yellow state, no relocation/recovery is happening, nothing in logs on DEBUG level.

ES 1.6 and I don't want to move the next version without current cluster reaching green state and all data being safely replicated and kept in-sync.

lwintergerst · April 22, 2016, 8:49am

I'm on 1.7
My Cluster is green. Its just that deletion is not possible

Crassirostris · April 22, 2016, 8:51am

Mine would be green too, but it hit the low disk watermark because of this problem
So I think it's pretty crucial to learn the reasons

lwintergerst · April 22, 2016, 8:53am

The only reason mine is green is because it started to ignore any watermark there is..

lwintergerst · April 22, 2016, 9:05am

Sorry to bother you @spinscale but this does look like an interesting bug.

spinscale · April 22, 2016, 9:32am

Hey,

does lsof show any open deleted files from that index/shard?
What kind of storage is this? Local disk? Any weird filesystems in action?

--Alex

lwintergerst · April 22, 2016, 9:44am

There are no open files from the supposedly deleted indices.

The storage is a little strange.
Local storage.
2 HDDs, RAID0 with mdadm -> physical volume1
1 HDD -> physical volume2
pv1+pv2 -> logical volume1 (lvm)

We had to go this way as we added one disk just recently and had problems expanding the existing raid.

Crassirostris · April 22, 2016, 10:39am

We have 8 TB data and 2k shards, so I physically cannot do it on all such shards, but several probes showed that there are no open files from that directories.

Local storage, HDD, ext4

BTW, this just caused one of the indices to stop serving requests, since replicas quorum wasn't met =__=

lwintergerst · April 22, 2016, 10:43am

We also run on ext4.

Just so I can compare it to our setup:
How many RAM does your server have?
How many HDDs of which size does your server have?

Crassirostris · April 22, 2016, 10:46am

8 nodes, 64 GB RAM, 10 GB JVM Heap
1 HDD on each node, 2TB each

Right now, all nodes reached low disk watermark (100GB)

spinscale · April 23, 2016, 1:44pm

Hey,

did you check dmesg output, and syslog to ensure there is no hardware issue? I assume writing on disks works as expected with sth like bonnie or dd?

I am not aware of any issues in that regard that were recently fixed or open on 1.x, but maybe I am missing something.

--Alex

lwintergerst · April 23, 2016, 4:41pm

I built a script yesterday which matches indices in the data directories with the output of the cat indices API and deletes(rm -rf on the files) those that should have been deleted.
For some reason the node is 100% working right now. I can delete data with the usual delete REST call.

The disk does not appear to be broken in any was, as everything is fine now..this is so strange.

FYI, we just started with our 2.x cluster. We hope to switch soon, so I don't mind if we do not find the reason for this problem. I just pinged you since this looked like a weird bug which may need to be fixed, if it really is one.
For now its just very hard to find the reason, as there is no evidence of any kind which could point us in the right direction. And there are so many parameters playing a big role here. I could think of 20 small things which could have been responsible

Luca

warkolm · April 23, 2016, 10:42pm

It'd be useful if you gave us the output of the delete command, including _cat/indices before and after showing the index in question.

At the moment you haven't given anything concrete to work off.

lwintergerst · April 24, 2016, 5:30am

I know.
The reason for it is that everything behaved normal. The only difference was that data on one node did not get deleted.
The output of the delete call was the usual acknowledged true.
The logs did not say anything unusual.

But since this problem is solved now for some reason, I am not able to give you more input. Sorry!

Topic		Replies	Views
Index deleted in Cluster Elasticsearch	1	231	July 6, 2017
Deleting indices with shadow replicas Elasticsearch	4	1456	July 5, 2017
What happened to deleted indices, shards and replicas? Elasticsearch	3	1211	January 22, 2019
Remove disks from nodes Elasticsearch	2	777	August 18, 2017
Elasticsearch index deletion and disk space reclamation Elasticsearch	5	6904	May 30, 2018

Node does not delete data from disk

Related topics