Elasticsearch data node instability and indexing failures when using NAS (shared storage) with separate folders per node

I am facing stability and performance issues in my Elasticsearch cluster and would like to understand if the storage architecture is the root cause.

We have multiple Elasticsearch data nodes running on Windows servers. For each node, a separate folder is configured as the data path, for example:

  • Node1 → \storage\elk\24
  • Node2 → \storage\elk\25
  • Node3 → \storage\elk\37
  • Node4 → \storage\elk\38

Although each node has a different folder, all these folders are hosted on the same NAS/shared storage system. (Z Drive)

We are observing the following issues:

  • Bulk indexing failures (HTTP 500 errors)
  • Errors such as:
    • AlreadyClosedException: this ReferenceManager is closed
    • RemoteTransportException
    • UnavailableShardsException
  • Shards becoming unavailable intermittently
  • Cluster not recovering properly at times
  • Locking Issue
  • Grafana dashboards taking a long time to load data

We have verified that disk usage is within limits and there are no watermark issues. However, when we tested by moving one node to a local disk, it behaved more stable compared to nodes using shared storage.

I would like to understand:

  • Is using NAS/shared storage (even with separate folders per node) supported for Elasticsearch data paths?
  • Can shared storage cause shard instability, indexing failures, or errors like AlreadyClosedException?
  • Does Elasticsearch require completely isolated local storage per node for stable operation?

Could our current setup (multiple nodes using different folders on the same NAS) be the reason for these issues?

Can i get official documentation for the same.

Any guidance would be helpful.

Welcome!

Yes — your storage design is a very plausible root cause.

Short answer:

  • Separate folders are not enough if they all sit on the same NAS/shared storage.
  • Elasticsearch can use remote storage only if that storage behaves exactly like a local disk from the filesystem’s point of view.
  • In practice, direct-attached/local storage is generally preferred and is usually more stable and faster.
  • If your NAS introduces latency, locking quirks, transient I/O stalls, or filesystem semantics that differ from local disk, it can absolutely lead to:
    • shard instability
    • slow recovery
    • indexing failures
    • transport exceptions
    • lock-related problems
    • poor dashboard/query performance

Elasticsearch requires the filesystem to act as if it were backed by a local disk … it will work correctly on properly-configured remote block devices (e.g. a SAN) and remote filesystems (e.g. NFS) as long as the remote storage behaves no differently from local storage.

This is the most important point for your case. A NAS / mapped network drive / shared filesystem is not automatically supported just because it is mounted and writable. It must preserve the same semantics and reliability Elasticsearch expects from local disk.

Also directly-attached (local) storage generally performs better than remote storage because it is simpler to configure well and avoids communications overheads. Some remote storage performs very poorly, especially under the kind of load that Elasticsearch imposes.

That aligns strongly with your observation that the node moved to local disk became more stable.

Answers to your specific questions

  • Is using NAS/shared storage (even with separate folders per node) supported for Elasticsearch data paths?

It is not “NAS is supported” in a blanket sense. The official position is closer to:

  • Remote storage may work
  • Only if it behaves exactly like local storage
  • You must benchmark and validate it under realistic Elasticsearch load

So if your NAS is exposed as a Windows mapped drive / shared filesystem and has any issues with latency, file locking, caching, metadata operations, or transient disconnects, that can make it unsuitable.

Can shared storage cause shard instability, indexing failures, or errors like AlreadyClosedException?

Yes, it can contribute to or trigger these symptoms.

While AlreadyClosedException is not a message that uniquely proves “NAS is the cause,” unstable or slow storage can absolutely lead to cascading shard problems such as:

  • shard closures/reopens
  • failed recoveries
  • lock acquisition problems
  • shard unavailability
  • delayed writes / fsync issues
  • node instability under load

Those can then surface as higher-level errors like:

  • UnavailableShardsException
  • transport exceptions
  • bulk indexing failures
  • lock-related failures
  • slow Grafana queries/dashboard loads

Your note that one node became more stable on local disk is a strong practical indicator.

Does Elasticsearch require completely isolated local storage per node for stable operation?

Best practice: yes, isolated local storage per node is strongly preferred.

More precisely:

  • Each node must have its own data path
  • The storage behind that path should ideally be dedicated local/direct-attached storage
  • Remote/shared storage is only acceptable if it is proven to behave like local disk

So Elasticsearch does not strictly require “local disk only” in all cases, but for self-managed clusters, local isolated storage is the safest and most common recommendation.

Could our current setup (multiple nodes using different folders on the same NAS) be the reason for these issues?

Yes — very possibly.

Even though each node uses a different folder, all nodes still depend on the same shared NAS backend. That creates several risks:

  • shared I/O bottleneck across all nodes
  • latency spikes affecting multiple nodes at once
  • locking/metadata behavior that differs from local disk
  • correlated failures during heavy indexing or recovery
  • poor shard recovery performance
  • cluster instability when the storage stalls

So the fact that the folders are different does not eliminate the architectural risk, because the underlying storage system is still shared.

Can i get official documentation for the same?

Here are the most relevant official Elastic docs for your case:

HTH

Thank you for your response.

I am trying to understand the correct way to use storage architecture with Elasticsearch.

You mentioned that remote storage may work only if it behaves exactly like local storage, and must be validated under load. I want to clarify how that applies in my case.

Currently, I have configured storage as follows:

  • One shared storage box of 8 TB
  • Mounted on 4 Elasticsearch data nodes
  • On the same drive (Z:), I have created separate folders:
    • data1, data2, data3, data4
  • Each node uses its own folder as path.data
  • All nodes can access the full 8 TB storage

With this setup, I am facing issues like:

  • AlreadyClosedException
  • shard failures
  • indexing (bulk) failures
  • overall instability and slow performance
  • locking issues
  • cluster health becomes red
  • Dashboards not loading

However, when I reconfigured the setup to use isolated/local drives for each server, the cluster started behaving normally and became stable.


I would like to understand the following:

  1. When you say remote storage may work, how should it actually be configured in practice?
  2. In my current SAN setup (same storage, different folders per node), is this considered supported or problematic? and why?
  3. What specific issues can arise with this kind of shared storage architecture, and why?
  4. How exactly should remote storage behave to be considered “like local disk” from Elasticsearch’s perspective (in terms of latency, locking, fsync, etc.)?
  5. Does Elasticsearch officially support SAN/NAS or network drives for path.data, and what are the expected challenges if they are used?

You will need to speak to your storage vendor about this. It depends very much on the storage, and there are too many variables. It's unlikely we can help you on this forum because we only really know about Elasticsearch configuration.

We don't really support (or not-support) any particular storage setup. As long as it looks to Elasticsearch like local storage then we expect it to work.

The ones you listed, including exceptions and slow performance. The reason is that your storage is behaving differently from local storage.

It must respond identically in all ways, and adequately fast. Low latency, and all operations (including locking and fsyncs) must be implemented correctly.

We don't support (or not-support) any particular storage configuration, it's up to you to supply storage which behaves like local storage and performs adequately for your needs.