I'm setting up an Elasticsearch-based log cluster and I'm having some
doubts about how I should choose the number of indices and shards.
By default, Logstash and Kibana use per-day indices and Elasticsearch
defaults to five shards per index. I'm worried that this will create
an excessive number of shards with a log retension of, say, 100 days.
With one replica per shard I'd be facing 1000 shards cluster-wide.
With three or four data nodes that's at least 250 shards per node.
Whether this is too much obviously depends on the node and perhaps
on the size of the daily indices, but regardless it doesn't seem
particularly advantageous with such a number of shards. Would it
make more sense to use week-based indices or reduce the number of
(primary) shards per index to two or three to get the number of
shards per node down towards or below 100? Or should I stop worrying?
--
Magnus Bäck | Software Engineer, Development Tools magnus.back@sonymobile.com | Sony Mobile Communications
Long - As you mentioned this is very dependant on your node specs, but
ideally you want one shard per node. However you can over-allocate and not
run into problems, plus it allows easier balancing when you add more nodes
to the cluster. Using daily is a better as you can drop smaller units (ie indexes) of
data to suit your needs, and if you use elasticsearch-curator then your
retention can be automatically managed.
I'm setting up an Elasticsearch-based log cluster and I'm having some
doubts about how I should choose the number of indices and shards.
By default, Logstash and Kibana use per-day indices and Elasticsearch
defaults to five shards per index. I'm worried that this will create
an excessive number of shards with a log retension of, say, 100 days.
With one replica per shard I'd be facing 1000 shards cluster-wide.
With three or four data nodes that's at least 250 shards per node.
Whether this is too much obviously depends on the node and perhaps
on the size of the daily indices, but regardless it doesn't seem
particularly advantageous with such a number of shards. Would it
make more sense to use week-based indices or reduce the number of
(primary) shards per index to two or three to get the number of
shards per node down towards or below 100? Or should I stop worrying?
--
Magnus Bäck | Software Engineer, Development Tools magnus.back@sonymobile.com | Sony Mobile Communications
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.