one of my data nodes does not remove data from disk after I delete an index.
Another node which holds 100% of the same data does so without a problem.
The folders in the data directory do not get removed and don't change their size after deleting an index.
Restarting the node or the server did not help at all.
EDIT1: For now I can delete data manually after deleting an index, since the node does not recognize that it still holds the data. The data is also no longer accessible
EDIT2: The logs do not say anything at all...
Exactly the same problem. After restart, all data from replicas doesn't get removed and prevents new replicas from being allocated, cluster in yellow state, no relocation/recovery is happening, nothing in logs on DEBUG level.
ES 1.6 and I don't want to move the next version without current cluster reaching green state and all data being safely replicated and kept in-sync.
There are no open files from the supposedly deleted indices.
The storage is a little strange.
Local storage.
2 HDDs, RAID0 with mdadm -> physical volume1
1 HDD -> physical volume2
pv1+pv2 -> logical volume1 (lvm)
We had to go this way as we added one disk just recently and had problems expanding the existing raid.
We have 8 TB data and 2k shards, so I physically cannot do it on all such shards, but several probes showed that there are no open files from that directories.
Local storage, HDD, ext4
BTW, this just caused one of the indices to stop serving requests, since replicas quorum wasn't met =__=
I built a script yesterday which matches indices in the data directories with the output of the cat indices API and deletes(rm -rf on the files) those that should have been deleted.
For some reason the node is 100% working right now. I can delete data with the usual delete REST call.
The disk does not appear to be broken in any was, as everything is fine now..this is so strange.
FYI, we just started with our 2.x cluster. We hope to switch soon, so I don't mind if we do not find the reason for this problem. I just pinged you since this looked like a weird bug which may need to be fixed, if it really is one.
For now its just very hard to find the reason, as there is no evidence of any kind which could point us in the right direction. And there are so many parameters playing a big role here. I could think of 20 small things which could have been responsible
I know.
The reason for it is that everything behaved normal. The only difference was that data on one node did not get deleted.
The output of the delete call was the usual acknowledged true.
The logs did not say anything unusual.
But since this problem is solved now for some reason, I am not able to give you more input. Sorry!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.