The Context I am a C++ developer working on an on-premise Visual Analytics application. Our service (DocumentStore) manages an Elasticsearch cluster (handling logs, events, configuration). We need to implement a Backup & Restore feature.
The Constraints (Why this is hard) We deploy into strict on-prem environments (often air-gapped) on both Windows and Linux.
-
No NAS Guarantee: We cannot force customers to provision a shared network drive (NFS/SMB) for the cluster.
-
Portability (The "Tarball" Requirement): Users need to click a button, generate a backup, and download a single
.tar.gz/.zipfile. They might move this file to a different environment to restore it. -
Cluster Support: We must support multi-node ES clusters.
The Problem Since we don't have a shared filesystem, we can't use the standard fs repository type. If we point Node A and Node B to a local path (e.g., C:\Backups), the Master Node can't verify the integrity of the snapshot because the files are split across physical machines ("split-brain" filesystem).
The Proposed Solution: "The Local S3 Sidecar" To solve the "No NAS" problem, I am planning to bundle a lightweight S3-compatible server (sidecar) alongside our C++ service.
-
The Setup: Our service launches an S3 gateway (e.g., SeaweedFS or Rclone) listening on port 8333 on the local machine.
-
The Config: We configure Elasticsearch (via
repository-s3) to write tohttp://<Our_Service_IP>:8333. -
The Result: ES streams data over TCP. It thinks it is writing to the cloud, but it's actually writing to our service's local disk.
-
The Export: When the user clicks "Download," our service zips the local data directory and serves it.
The Dilemma: Rclone vs. SeaweedFS I am debating between two tools for this "Sidecar" role and would love community feedback:
-
Option A: Rclone (
rclone serve s3)-
Pros: It writes plain, transparent files to disk (1:1 mapping with ES Snapshot files). This makes zipping the directory for the "Tarball" requirement instant and simple.
-
Cons: It is a "dumb pipe." No replication. If the node running Rclone dies, that repository is offline.
-
-
Option B: SeaweedFS
-
Pros: Native replication. I can run it on multiple nodes and have them sync data active-active, providing High Availability for the backup repo.
-
Cons: It stores data as opaque blobs/volumes. To generate the user's "Tarball," I cannot just zip the disk. I have to "export" the files via API to a temp folder first, effectively doubling the I/O and disk space required during download.
-
My Questions to the Community:
-
Is this an anti-pattern? Is running an S3 gateway on local disk specifically to bypass the "Shared Drive" requirement for ES Clusters a known recipe for disaster?
-
Rclone Reliability: Has anyone used
rclone serve s3as a production target for Elasticsearch snapshots? Is the locking/consistency robust enough for ES? -
SeaweedFS Overhead: For those using SeaweedFS, is using it strictly as a "Local Filesystem Abstraction" overkill? Is the overhead of exporting files out of the blob format (to create a zip) painful for datasets in the 10GB-100GB range?
Any insights or war stories from similar air-gapped deployments would be greatly appreciated.