Huge Index

senthil_prabhu · December 30, 2010, 9:16am

hi team,
if i have 500 GB of data for an index. how can i
configure shards and replicas for that, and also gateway.

how many servers i need to achieve this.

is there any possible to split gateway?

Pls help..

harryf · December 30, 2010, 2:44pm

For understanding the role of shards and replicas, this thread from
the mailing list is a good read - http://goo.gl/hFOVr

On Dec 30, 10:16 am, senthil prabhu senthils...@gmail.com wrote:

hi team,
if i have 500 GB of data for an index. how can i
configure shards and replicas for that, and also gateway.

how many servers i need to achieve this.

is there any possible to split gateway?

Pls help..

dbenson · December 30, 2010, 5:02pm

When we started our deployment, we thought the shared gateway would be
idea. Having a central place with all our index data seemed
conceptually nice. It would provide the warm fuzzy of a backup.

But when we got further along, we came to recognize the limitations of
the shared file system gateway and the benefits of the local gateway.

Single point of failure - if the gateway goes down for an extended
period, snapshotting will fail. We used the confirmation of the
snapshot to acknowedge a new doc from our document repository.
NFS can easily saturate 1G NICs and cause clusters to become split,
which requires manual intervention.
Gateway needs to be large enough to store the full index

The local gateway ends up acting like software RAID 10 (1+0)

Replica count acts as mirroring (RAID 1). In our environment we have
replica=1, but we have multiple data centers, each with a full index.
In the advent of extensive failure, we can route client traffic around
the problem. If you don't have this luxury, you may want a higher
replica count.
Shards act like striping (RAID 0). Set the shard count to ensure
each is a manageable size
Adding more servers, means each server contains a smaller portion of
the overall index for performance and reliability

The size of the index is one consideration for the number of servers,
but query volume and complexity is another driving factor. I'd
consider two an absolute bare minimum, with three providing a margin
for failure. Obviously, your budget may constrain or expand your
choices.

We have 3 servers in each data center, with 28M docs consuming 170G
disk (soon to shrink with ES 0.14), handling about 6k req/min for
client queries and 195k document matches/minute for alerting purposes.
With our hardware, we're hardly taxing them and still averaging
30-35ms response times.

David

Topic		Replies	Views
Sizing ES Gateways Elasticsearch	2	291	July 6, 2017
ES Recommended Configuration? Elasticsearch	3	937	July 6, 2017
ElasticSearch setup Elasticsearch	4	287	July 6, 2017
Questions regarding sharding and upcoming local gateway Elasticsearch	5	393	July 6, 2017
Node Local Storage and Gateway Storage Elasticsearch	2	1118	July 6, 2017

Huge Index

Related topics