Hello,
We are currently running self-managed Elastic using Azure VMs.
Due to vast amount of data ingested, 100TB+ we are leveraging Azure's premium disks for our cold and hot data nodes.
We are trying to move away from the premium SSDs , and are considering Azure Blob Storage. There seems to be a way to do this using BlobFuse2.
I am wondering if this is possible to replace SSD Managed Disks with Azure Blob Storage for hot and cold data nodes?
The contents of the path.data directory must persist across restarts, because this is where your data is stored. Elasticsearch requires the filesystem to act as if it were backed by a local disk, but this means that it will work correctly on properly-configured remote block devices (e.g. a SAN) and remote filesystems (e.g. NFS) as long as the remote storage behaves no differently from local storage. You can run multiple Elasticsearch nodes on the same filesystem, but each Elasticsearch node must have its own data path.
The performance of an Elasticsearch cluster is often limited by the performance of the underlying storage, so you must ensure that your storage supports acceptable performance. Some remote storage performs very poorly, especially under the kind of load that Elasticsearch imposes, so make sure to benchmark your system carefully before committing to a particular storage architecture.
In my experience FUSE-based filesystems fail to satisfy the "behaves no differently from local storage" constraint, and also tend to have pretty poor performance overall, but I've never used this particular one.
The best approach would be to use searchable snapshots - this feature is specifically designed to make the best use of blob storage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.