Shard Recommendation for Elasticsearch

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The number of shards will help you scale out in case you add more nodes in
the future. With your current shard count at 5, you cannot optimally deploy
and distribute a 6+ node cluster. However, your data is time-based, one per
day. Are queries on historical data important? I would start off with a
shard count of 4 per index, letting node receive part of the index (ideally
more of the index with replication) and then change the shard count in case
you increase your cluster. Your older indices may not be optimally
distributed, but your new ones, and presumedly your more important ones,
will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, elorion@gmail.com wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDkRSekQJzJY5S%2BZ4wSUKK-YhSGmvWP%3D9-d46u1wAaAFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Ivan,

Thanks for the reply. So if I store data, one index per day, across 6 data
nodes (4 or 5 shards each node) for a year..that's something like 10,000
shards in the cluster. Does that make sense? And also, is this safe?

On Saturday, October 18, 2014 2:41:50 PM UTC-4, Ivan Brusic wrote:

The number of shards will help you scale out in case you add more nodes in
the future. With your current shard count at 5, you cannot optimally deploy
and distribute a 6+ node cluster. However, your data is time-based, one per
day. Are queries on historical data important? I would start off with a
shard count of 4 per index, letting node receive part of the index (ideally
more of the index with replication) and then change the shard count in case
you increase your cluster. Your older indices may not be optimally
distributed, but your new ones, and presumedly your more important ones,
will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, <elo...@gmail.com <javascript:>> wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Each shard is a Lucene index, so it will consume resources at the file
system level. Elasticsearch itself will be able to handle the coordination
between many shards. You next need to think about how much data each shard
actually has. Distributed logging can create volumes of logs, perhaps too
much for a 4 node cluster.

--
Ivan

On Sun, Oct 19, 2014 at 6:07 AM, elorion@gmail.com wrote:

Hi Ivan,

Thanks for the reply. So if I store data, one index per day, across 6 data
nodes (4 or 5 shards each node) for a year..that's something like 10,000
shards in the cluster. Does that make sense? And also, is this safe?

On Saturday, October 18, 2014 2:41:50 PM UTC-4, Ivan Brusic wrote:

The number of shards will help you scale out in case you add more nodes
in the future. With your current shard count at 5, you cannot optimally
deploy and distribute a 6+ node cluster. However, your data is time-based,
one per day. Are queries on historical data important? I would start off
with a shard count of 4 per index, letting node receive part of the index
(ideally more of the index with replication) and then change the shard
count in case you increase your cluster. Your older indices may not be
optimally distributed, but your new ones, and presumedly your more
important ones, will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, elo...@gmail.com wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDEvP8Rx%3DKauSdS7NfNDqmy_e6i6oQbwDtJ%2Bx-8x_rmRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.