Recently, we have been encountering the "CorruptIndexException" frequently, accompanied by the following stacktrace: "org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/UAS5VDw1Sv6xrrrvoN39Bw/1/index/_52.kdm")))".
Upon checking the "_52.kdm" file, we found that it actually contains 143 bytes. Has anyone else encountered a similar issue?
We are currently using Elasticsearch version 7.17.5 and spring-data-elasticsearch version 4.4.2.
Yeah GlusterFS isn't at all a local disk and the error you're seeing indicates it does not behave like a local disk accurately enough for Elasticsearch. See these docs for more information:
Elasticsearch requires the filesystem to act as if it were backed by a local disk, but this means that it will work correctly on properly-configured remote block devices (e.g. a SAN) and remote filesystems (e.g. NFS) as long as the remote storage behaves no differently from local storage.
We created a disk on SAN storage and set up a new VM with the disk. Docker and Docker Compose have been installed on the VM, and the same VM functions as a worker node within a swarm cluster.
On the VM, we created a directory named "/data" and mounted it as a GlusterFS volume to ensure persistent data across the worker nodes.
I would like to clarify that the Gluster setup is not based on a shared volume or disk. Each worker node has its own local disk, and the GlusterFS directory is mounted on each node to achieve data persistence.
Could you please confirm if this setup appears to be fine or if it might be the cause of the issue?
I can confirm that there is definitely something in your storage setup that is not fine (i.e. doesn't behave like a local disk as ES requires). Since the problem is outside of ES, I can't really help you pin it down further. But I am suspicious of GlusterFS because it has had problems like this before, and GlusterFS 7 in particular has been EOL and unmaintained for years.
As per the docs I linked above:
To narrow down the source of the corruptions, systematically change components in your cluster’s environment until the corruptions stop.
In particular try using a more common filesystem instead of GlusterFS and see if the problems go away.
I know of no issues with Azure persistent storage, but I am also not familiar with its various configuration options and also think we don't run many (any?) tests with it. If you encounter problems, you'll need to contact the Azure folks for help.
It looks like Azure File is distributed storage accessed via SMB or NFS. This type of storage can often result in very poor performance and may not necessarily behave like local storage like David described is required. I therefore would not rule out that you may experience similar corruption issues with this but am also not aware of any reported issues.
I would recommend using premium or standard storage.
Thanks @Christian_Dahlqvist , even though we goes with Standard/Premium disk, I cant move the container across the other worker node in the cluster since I do not the have the shared mount path and I will lost my data when the container moves to another node, So Im looking for a solution that I can run Elasticsearch container on any of the available node without losing the data/corrupting the data.
Looking for some suggestions on my need. Kindly help...
Oh okay, when it comes to Kubernetes persistent volumes, there might be need to share the volume with other nodes to keep the pod highly available, also when we go with more than 1 replicas on multiple nodes, we need to keep the volume as shared between the nodes. then only the same data can be available across all the replicas.
What type of volumes can be used for this requirement?