Why NFS is to be avoided for data directories

Yes, latency is a big factor. Another is correctness: Elasticsearch expects the filesystem under the data path to act like a local filesystem, and uses some fairly advanced features that are typically not well supported by nonlocal storage. For instance it needs locking and atomic file creation to work right; it's notoriously tricky to get NFS to do these things correctly.

Distributed storage like Ceph and GlusterFS is something I'd avoid. These technologies are still maturing IMO and have been linked to lost or corrupt data in the recent past. You don't need distributed storage since Elasticsearch handles the distributed side of things for you.

SANs work ok where performance is less important (e.g. the cold tier). I haven't heard of as many correctness issues with SANs as with your other suggestions.

3 Likes