"stacktrace": ["org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput)

Recently, we have been encountering the "CorruptIndexException" frequently, accompanied by the following stacktrace: "org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/UAS5VDw1Sv6xrrrvoN39Bw/1/index/_52.kdm")))".

Upon checking the "_52.kdm" file, we found that it actually contains 143 bytes. Has anyone else encountered a similar issue?

We are currently using Elasticsearch version 7.17.5 and spring-data-elasticsearch version 4.4.2.

Almost certainly that means your storage doesn't work correctly under concurrent access. Are you using local disks or something network-attached?

See these docs for more information.

Thanks for your response @DavidTurner. We are using VM local disk[Gluster file system].

Have a look at the folowing, potentially related issues:

Yeah GlusterFS isn't at all a local disk and the error you're seeing indicates it does not behave like a local disk accurately enough for Elasticsearch. See these docs for more information:

Elasticsearch requires the filesystem to act as if it were backed by a local disk, but this means that it will work correctly on properly-configured remote block devices (e.g. a SAN) and remote filesystems (e.g. NFS) as long as the remote storage behaves no differently from local storage.

1 Like

Thanks for your response @Christian_Dahlqvist. I'll have a look.

Thanks @DavidTurner. We will go through the docs and will review our setup.

@DavidTurner,
We created a disk on SAN storage and set up a new VM with the disk. Docker and Docker Compose have been installed on the VM, and the same VM functions as a worker node within a swarm cluster.

On the VM, we created a directory named "/data" and mounted it as a GlusterFS volume to ensure persistent data across the worker nodes.

I would like to clarify that the Gluster setup is not based on a shared volume or disk. Each worker node has its own local disk, and the GlusterFS directory is mounted on each node to achieve data persistence.

Could you please confirm if this setup appears to be fine or if it might be the cause of the issue?

I can confirm that there is definitely something in your storage setup that is not fine (i.e. doesn't behave like a local disk as ES requires). Since the problem is outside of ES, I can't really help you pin it down further. But I am suspicious of GlusterFS because it has had problems like this before, and GlusterFS 7 in particular has been EOL and unmaintained for years.

As per the docs I linked above:

To narrow down the source of the corruptions, systematically change components in your cluster’s environment until the corruptions stop.

In particular try using a more common filesystem instead of GlusterFS and see if the problems go away.

1 Like

Hi @DavidTurner ,
We are planning to move On-Prem servers to Azure cloud and planning to have the Elasticsearch data in Azure files as persistent storage.

Do we have any known issues to have the Elasticsearch data on Azure files storage?

I know of no issues with Azure persistent storage, but I am also not familiar with its various configuration options and also think we don't run many (any?) tests with it. If you encounter problems, you'll need to contact the Azure folks for help.

Sure thank you, what are your recommendations as Shared storage options for running Elasticsearch with Docker swarm setup.?

It looks like Azure File is distributed storage accessed via SMB or NFS. This type of storage can often result in very poor performance and may not necessarily behave like local storage like David described is required. I therefore would not rule out that you may experience similar corruption issues with this but am also not aware of any reported issues.

I would recommend using premium or standard storage.

Thanks @Christian_Dahlqvist , even though we goes with Standard/Premium disk, I cant move the container across the other worker node in the cluster since I do not the have the shared mount path and I will lost my data when the container moves to another node, So Im looking for a solution that I can run Elasticsearch container on any of the available node without losing the data/corrupting the data.

Looking for some suggestions on my need. Kindly help...

I think the problem here is likely Docker Swarm - AIUI it doesn't really work very well with stateful applications like Elasticsearch.

1 Like

How about Kubernetes/Minikube..?

I have no experience using Docker Swarm so can not comment on that.

There are lots of users running Elasticsearch successfully on kubernetes. It does support persistent volumes so does not require shared storage the same way.

Oh okay, when it comes to Kubernetes persistent volumes, there might be need to share the volume with other nodes to keep the pod highly available, also when we go with more than 1 replicas on multiple nodes, we need to keep the volume as shared between the nodes. then only the same data can be available across all the replicas.
What type of volumes can be used for this requirement?

@Christian_Dahlqvist @DavidTurner Looking for your kind suggestions on this