Need best practices for: Proxmox cluster + iSCSI LVM shared storage + elasticsearch cluster

I have 3 servers in proxmox cluster with configured iSCSI LVM shared storage , I was planning to setup 3 node elasticsearch cluster , but I don't know what will be the best configuration here. At first I tried create 1 vm elasticsearch node on each physical server with 1 small local disk (for hot and warm indices) and 2nd big in LVM shared storage (for cold indicies), but that would be overall 3 big disks in shared storage from all nodes containing same data. I am just beginning my experience with elasticsearch and would appreciate help. Thank You.

Welcome! Happy to have you join our community of users!

Sorry if this is too much information. I want you to have a positive experience with Elasticsearch, and I fear that will not be the case with network attached storage for the reasons explained below.

The Problem

I have run Elasticsearch on Proxmox (I now use ECK: Elastic Cloud Kubernetes instead, which allows you to manage multiple clusters on your own k8s deployment). I used LXC containers in my Proxmox deployment, rather than VMs. Even though you can use iSCSI LVM storage and assign it to VMs or LXC containers I would advise against it. Using remote or network storage is generally discouraged as a matter of best practices. I have provided a benchmark comparison example below. The long story short version is that the only way to really make network attached storage of any kind competitive with directly attached NVMe or SSD is a SAN, which tend to be very expensive and require multiple FC or 40Gbps (or faster) network connections to deliver comparable IOps and transfer speeds (and yet usually still fall short, performance-wise). Most of the times I've seen a SAN in use it's because the organization already had it, rather than purchased it specifically for Elasticsearch. Even then, when they've conducted performance tests, they've tended to switch to direct attached NVMe or SSD drives (yeah, the difference can be that profound).

The Why

As convenient as it can be to have iSCSI and LVM set up in Proxmox, or similar SAN type technologies, it's generally much faster to have local SSD or NVMe storage and allow Elasticsearch to handle replication between nodes (because it wants replicas), rather than use a network attached storage. When I played with Elasticsearch on my Proxmox, I used LVM-Thin provisioning to my local NVMe drives, rather than use a network attached storage. You should test the throughput in your own environment to satisfy both your curiosity, and your performance requirements. A good tool for this is FIO. I also used that to benchmark different storage approaches in my k8s cluster, where each node has a single 10Gbps network interface, to measure the network throughput max.

A Benchmark

In my k8s cluster, the difference between network storage using OpenEBS (even using Mayastor's NVMeoF) and truly local-to-the-node NVMe storage was the difference between a measured 403MB/s at 131,220 IOps with the networked storage vs. 1498MB/s at 487,200 IOps with the direct storage. The performance comparison is not even close. Even with a theoretical maximum of 1.25GBytes/sec of network throughput from a single 10Gbps network connection, I was only able to get 403MB/sec at best over the network. I'm not sure how your Proxmox nodes are connected to one another, but even with PCIe 4 or 5 level NVMes, you can't exceed the network throughput maximum. My best performance is still less than the NVMe theoretical maximum for my PCIe 3 NVMes (probably a limitation of my particular NVMe, rather than the bus itself).

As far as topology goes, I would recommend skipping your cold tier and just have hot on local storage and warm on "slower" storage (which is typically just spinning disks, rather than network attached). The reasoning behind this is that a "cold" tier without a license is typically just a warm tier without replica shards. You can have your warm tier still, and then as the indices age towards cold, you can remove the replica shard. They'll still reside on the same storage alongside one another but consume less space. Having a dedicated warm tier on local disk (unless it's a spinning disk) is probably not worth the extra effort to maintain.

Also, I still recommend using LXC containers instead of VMs. You still have full control over how much memory and CPU you can assign/dedicate to the container, and it has lower overhead than a full VM.

Thank You very much , thats a lot of usefull information. Unfortunatelly for me :wink: the SAN we have is speciffically destined to store old SIEM logs. Thats why I am trying to find best way to fit it in planned build. Its 50TB 4x 10Gbase-T iSCSI new Dell matrix. Is there a way to use local disks for normal cluster operation and also automatically backup old logs on SAN . Will there be any way then to search those old logs by cluster , even with slow performace ?

To be honest, I would say the best way to achieve that would be to have the SAN be a MinIO storage vessel and use it as a snapshot repository and mount your "cold" data on the Frozen tier. Since that requires a Licensed version of Elasticsearch, that may not be in your budget.

The alternative would be to have both paths mounted. The local SSD/NVMe drive as the path.data for the hot nodes, and then a mount point from the SAN as the path.data for the "warm/cold" nodes. I would then just move the data from hot to warm/cold when you need. You'd still need to run multiple nodes to achieve this, but it is my opinion that this is the best way to take advantage of that space without access to the searchable snapshot (licensed) feature.