Shard Recommendation for Elasticsearch

elorion · October 18, 2014, 2:04pm

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · October 18, 2014, 6:41pm

The number of shards will help you scale out in case you add more nodes in
the future. With your current shard count at 5, you cannot optimally deploy
and distribute a 6+ node cluster. However, your data is time-based, one per
day. Are queries on historical data important? I would start off with a
shard count of 4 per index, letting node receive part of the index (ideally
more of the index with replication) and then change the shard count in case
you increase your cluster. Your older indices may not be optimally
distributed, but your new ones, and presumedly your more important ones,
will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, elorion@gmail.com wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDkRSekQJzJY5S%2BZ4wSUKK-YhSGmvWP%3D9-d46u1wAaAFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elorion · October 19, 2014, 1:07pm

Hi Ivan,

Thanks for the reply. So if I store data, one index per day, across 6 data
nodes (4 or 5 shards each node) for a year..that's something like 10,000
shards in the cluster. Does that make sense? And also, is this safe?

On Saturday, October 18, 2014 2:41:50 PM UTC-4, Ivan Brusic wrote:

The number of shards will help you scale out in case you add more nodes in
the future. With your current shard count at 5, you cannot optimally deploy
and distribute a 6+ node cluster. However, your data is time-based, one per
day. Are queries on historical data important? I would start off with a
shard count of 4 per index, letting node receive part of the index (ideally
more of the index with replication) and then change the shard count in case
you increase your cluster. Your older indices may not be optimally
distributed, but your new ones, and presumedly your more important ones,
will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, <elo...@gmail.com <javascript:>> wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · October 19, 2014, 8:31pm

Each shard is a Lucene index, so it will consume resources at the file
system level. Elasticsearch itself will be able to handle the coordination
between many shards. You next need to think about how much data each shard
actually has. Distributed logging can create volumes of logs, perhaps too
much for a 4 node cluster.

--
Ivan

On Sun, Oct 19, 2014 at 6:07 AM, elorion@gmail.com wrote:

Hi Ivan,

Thanks for the reply. So if I store data, one index per day, across 6 data
nodes (4 or 5 shards each node) for a year..that's something like 10,000
shards in the cluster. Does that make sense? And also, is this safe?

On Saturday, October 18, 2014 2:41:50 PM UTC-4, Ivan Brusic wrote:

The number of shards will help you scale out in case you add more nodes
in the future. With your current shard count at 5, you cannot optimally
deploy and distribute a 6+ node cluster. However, your data is time-based,
one per day. Are queries on historical data important? I would start off
with a shard count of 4 per index, letting node receive part of the index
(ideally more of the index with replication) and then change the shard
count in case you increase your cluster. Your older indices may not be
optimally distributed, but your new ones, and presumedly your more
important ones, will be.

Cheers,

Ivan

On Sat, Oct 18, 2014 at 7:04 AM, elo...@gmail.com wrote:

Hi All,

I currently am planning on building out to a 4 Elasticsearch data node
cluster from currently at 2 and have a question regarding how many shards
to use for the indexes. I am running the ELK stack and currently each index
file, one per day, is creating 5 shards per node. As you can imagine this
will create a lot of shards across the nodes over a period of time. I have
read that having too many shards is bad for the cluster's health. Is there
a better way to calculate the best shard / replica strategy to avoid issues
but maintain redundancy? Thanks for your help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2f14-b2ae-4141-82ca-05872b94d673%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0070b801-5d67-4103-91d7-e9907b4af97b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDEvP8Rx%3DKauSdS7NfNDqmy_e6i6oQbwDtJ%2Bx-8x_rmRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Balance between number of indices and shards per index Elasticsearch	2	454	July 6, 2017
Sharding Strategy Elasticsearch	18	4348	November 22, 2017
What is the shard recommendation for 3 master nodes, 2 data nodes cluster Elasticsearch	9	574	October 19, 2018
No. of Shards Per index in ES Cluster Elasticsearch	4	1223	July 5, 2017
How many Shards / Replicas Elasticsearch	9	9800	July 5, 2017

Shard Recommendation for Elasticsearch

Related topics