Elasticsearch support for Distributed FS like Rook-ceph

Does Elasticsearch architecture supports Distributed File system for storage like rook-ceph?

No, that's not something that is supported or tested.

Is there any particular reason why distributed storage is not supported?
Is there any plan to support it?

It seems somewhat unnecessary since Elasticsearch already has distributed features built in. For instance, it natively scales out across multiple nodes, replicates its data, and automatically recovers from partial failures.

Remote storage also tends to have much higher latency than local disks, and this can have a big effect on performance.

Elasticsearch presents a particularly stressful workload to the filesystem and turns out to be rather good at hitting filesystem bugs and corner cases that other tests might have missed. This is true even of very mature and well-established filesystems. Distributed filesystems have seen much less production use and therefore seem a much riskier choice. See for instance this post with some links to recent glusterfs bugs:

If you intend on using ceph object storage or cephfs instead which are both built for shared access from many machines in parallel you will have bad performance and also run a high risk of data corruption.

But with traditional file system like ext4/xfs/ntfs running on ceph-rdb elasticsearch should run without problems. This or similar setups are used by many virtual machine hosters. The important part here is that access to a ceph block device is exclusive to one machine and uses a file system that elasticsearch had been tested with. We have multiple elasticsearch clusters running on a setup like that without problems for several years now without problems.

I am using Elasticsearch in kubernates environment with rook-ceph block storage and I am facing data corruption frequently.
Error log looks like this:

org.elasticsearch.bootstrap.StartupException: ElasticsearchException[java.io.IOException: failed to read [id:0, file:/data/data/nodes/0/_state/node-0.st]]; nested: IOException[failed to read [id:0, file:/data/data/nodes/0/_state/node-0.st]]; nested: CorruptStateException[org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=-637534208 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/data/data/nodes/0/_state/node-0.st")))]; nested: CorruptIndexException[codec footer mismatch (file truncated?): actual footer=-637534208 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/data/data/nodes/0/_state/node-0.st")))];
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.5.4.jar:6.5.4]

Right, @chaitra_hegde, that's what I mean. This isn't a supported or tested configuration and this sort of problem isn't really surprising to me. The solution is not to use a distributed filesystem.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.