Sizing ES Gateways

Hi All,

Anyone have some practical guidance on sizing the disk space needed for ES
gateways? If I have 100TB in indices and a replication factor of 3, I'd
expect 300TB of indices in the cluster. What size should I expect my
gateway data to take? What factors impact it? Specifically:

  • Gateway Type - Guessing the Hadoop option depends on the replication
    factor
  • of Machines in Cluster

  • Snapshot Frequency
  • ...

If it's not an exact science, even some rule of thumb guidance would be
helpful. Thanks,

--Mike

It depends on the gateway. First, regardless of the gateway used, the
cluster will need the size itself (300tb in your case) among the nodes. If
you use the local gateway, then you don't need any additional size, on full
shutdown and restart, the cluster will restore its state from the local
storage of each node (hence the name), where the replicas provides high
availability. If you use a shared gateway (fs, hadoop, s3), then what
happens is that the indices are snapshotted (copied) to the shred gateway,
and the size required for that is the size of the "primary shards" (without
replicas), in your case, 100TB.

I recommend to use local gateway, since it does not incur the overhead of
"snapshotting".

-shay.banon

On Thu, Dec 1, 2011 at 10:47 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Hi All,

Anyone have some practical guidance on sizing the disk space needed for ES
gateways? If I have 100TB in indices and a replication factor of 3, I'd
expect 300TB of indices in the cluster. What size should I expect my
gateway data to take? What factors impact it? Specifically:

  • Gateway Type - Guessing the Hadoop option depends on the replication
    factor
  • of Machines in Cluster

  • Snapshot Frequency
  • ...

If it's not an exact science, even some rule of thumb guidance would be
helpful. Thanks,

--Mike