Balance between number of indices and shards per index

I'm setting up an Elasticsearch-based log cluster and I'm having some
doubts about how I should choose the number of indices and shards.
By default, Logstash and Kibana use per-day indices and Elasticsearch
defaults to five shards per index. I'm worried that this will create
an excessive number of shards with a log retension of, say, 100 days.
With one replica per shard I'd be facing 1000 shards cluster-wide.
With three or four data nodes that's at least 250 shards per node.

Whether this is too much obviously depends on the node and perhaps
on the size of the daily indices, but regardless it doesn't seem
particularly advantageous with such a number of shards. Would it
make more sense to use week-based indices or reduce the number of
(primary) shards per index to two or three to get the number of
shards per node down towards or below 100? Or should I stop worrying?

--
Magnus Bäck | Software Engineer, Development Tools
magnus.back@sonymobile.com | Sony Mobile Communications

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140708071656.GA26839%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.

Short - Stop worrying!

Long - As you mentioned this is very dependant on your node specs, but
ideally you want one shard per node. However you can over-allocate and not
run into problems, plus it allows easier balancing when you add more nodes
to the cluster.
Using daily is a better as you can drop smaller units (ie indexes) of
data to suit your needs, and if you use elasticsearch-curator then your
retention can be automatically managed.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 8 July 2014 17:16, Magnus Bäck magnus.back@sonymobile.com wrote:

I'm setting up an Elasticsearch-based log cluster and I'm having some
doubts about how I should choose the number of indices and shards.
By default, Logstash and Kibana use per-day indices and Elasticsearch
defaults to five shards per index. I'm worried that this will create
an excessive number of shards with a log retension of, say, 100 days.
With one replica per shard I'd be facing 1000 shards cluster-wide.
With three or four data nodes that's at least 250 shards per node.

Whether this is too much obviously depends on the node and perhaps
on the size of the daily indices, but regardless it doesn't seem
particularly advantageous with such a number of shards. Would it
make more sense to use week-based indices or reduce the number of
(primary) shards per index to two or three to get the number of
shards per node down towards or below 100? Or should I stop worrying?

--
Magnus Bäck | Software Engineer, Development Tools
magnus.back@sonymobile.com | Sony Mobile Communications

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/20140708071656.GA26839%40seldlx20533.corpusers.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YojpvL%3DXXXVhirUOg0Y2QmQ0MnnByWgbcEPi-i5NHegg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.