Anyone have some practical guidance on sizing the disk space needed for ES
gateways? If I have 100TB in indices and a replication factor of 3, I'd
expect 300TB of indices in the cluster. What size should I expect my
gateway data to take? What factors impact it? Specifically:
Gateway Type - Guessing the Hadoop option depends on the replication
factor
of Machines in Cluster
Snapshot Frequency
...
If it's not an exact science, even some rule of thumb guidance would be
helpful. Thanks,
It depends on the gateway. First, regardless of the gateway used, the
cluster will need the size itself (300tb in your case) among the nodes. If
you use the local gateway, then you don't need any additional size, on full
shutdown and restart, the cluster will restore its state from the local
storage of each node (hence the name), where the replicas provides high
availability. If you use a shared gateway (fs, hadoop, s3), then what
happens is that the indices are snapshotted (copied) to the shred gateway,
and the size required for that is the size of the "primary shards" (without
replicas), in your case, 100TB.
I recommend to use local gateway, since it does not incur the overhead of
"snapshotting".
Anyone have some practical guidance on sizing the disk space needed for ES
gateways? If I have 100TB in indices and a replication factor of 3, I'd
expect 300TB of indices in the cluster. What size should I expect my
gateway data to take? What factors impact it? Specifically:
Gateway Type - Guessing the Hadoop option depends on the replication
factor
of Machines in Cluster
Snapshot Frequency
...
If it's not an exact science, even some rule of thumb guidance would be
helpful. Thanks,
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.