Why NFS is to be avoided for data directories

DavidTurner · January 16, 2020, 6:25am

Yes, latency is a big factor. Another is correctness: Elasticsearch expects the filesystem under the data path to act like a local filesystem, and uses some fairly advanced features that are typically not well supported by nonlocal storage. For instance it needs locking and atomic file creation to work right; it's notoriously tricky to get NFS to do these things correctly.

Distributed storage like Ceph and GlusterFS is something I'd avoid. These technologies are still maturing IMO and have been linked to lost or corrupt data in the recent past. You don't need distributed storage since Elasticsearch handles the distributed side of things for you.

SANs work ok where performance is less important (e.g. the cold tier). I haven't heard of as many correctness issues with SANs as with your other suggestions.

Topic		Replies	Views
Bad indexing performance of elasticsearch Elasticsearch	7	2651	July 5, 2017
Elasticsearch wich SAN storage? Elasticsearch	7	2636	August 9, 2019
ES and SAN Elasticsearch	7	4292	July 6, 2017
Newbie question - need suggestion - NFS share + 1 node Elasticsearch	2	406	August 31, 2020
Using Ceph with Elasticsearch Elasticsearch	7	8346	July 5, 2017

Why NFS is to be avoided for data directories

Related topics