Our goal is to run Elasticsearch on docker containers. Until now we have only managed to run it properly on an Azure Virtual Machine.
Our challenge is regarding storage. When using Elasticsearch, I understand that a type of persistent storage is needed. Therefore, the /usr/share/elasticsearch/data would need to be mounted outside the container so that indices don't disappear when container restarts.
So my question is: what types of storage is supported by Elasticsearch? Can anyone confirm whether it is technically possible to use the volume mount with Azure File Shares or Azure Blob Storage?
Is it even possible to start an Elasticsearch node when we have a volume mounted at /usr/share/elasticsearch/data? Until now we haven't been able to, but I would like to have a confirmation whether it is possible or not.
If there is any documentation regarding my questions I would appreciate it, and any suggestions would also be highly appreciated.
Elasticsearch doesn't really care what the filesystem is. These docs cover it:
Elasticsearch requires the filesystem to act as if it were backed by a local disk, but this means that it will work correctly on properly-configured remote block devices (e.g. a SAN) and remote filesystems (e.g. NFS) as long as the remote storage behaves no differently from local storage.
Note that "behaves no differently from local storage" is quite a strong constraint, and not all networked storage can satisfy it. I don't think we have experience with either of the options you suggest, but if Azure can confirm that they have the same semantics as local storage then Elasticsearch won't be able to tell the difference.
Also note the performance considerations in the next paragraph:
The performance of an Elasticsearch cluster is often limited by the performance of the underlying storage, so you must ensure that your storage supports acceptable performance. Some remote storage performs very poorly, especially under the kind of load that Elasticsearch imposes, so make sure to benchmark your system carefully before committing to a particular storage architecture.
Azure File Storage shares run over CIFS, you can try but I wouldn't be surprised if it doesn't work properly.
When you say "on docker containers" are you talking about just running Docker on an Azure VM, using an AKS cluster, or container instances? In the first two cases you should be able to mount an Azure "managed disk" for the Elasticsearch data directory, which works exactly like a local filesystem, but for container instances I think they only support file shares rather than managed disks.
Blob storage is something else entirely - you can use Azure Blob Storage as a snapshot repository but not as a primary data filesystem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.