# Figuring out the optimal number of shards

(Otis GospodnetiÄ‡) #1

Hello,

Is the following the correct (and complete!) thinking one should apply
when considering how many shards to specify when creating an index?

Set the number of shards to a high number if you think the number of
nodes in the cluster will grow over time. If you want to be able to
have a 100-node cluster, then create an index with at least 100
shards. Then if there is no shard replication, each node will have 1
shard. If number of shard replicas=1 then each node will have 2
shards - 1 primary shard and 1 replica of a shard whose primary node
is some other node in the cluster.

If the above is correct, then should one specify the number of index
shards to be equal to the eventual number of nodes in the cluster if
the number of replica is set to 0? More generally, # shards = # nodes
x # replicas?

(To keep things simple and to keep the focus on figuring out the
number of shards I'm ignoring other factors like RAM vs. shard/index
size, number of concurrent queries, query complexity, etc.)

The above may not be all correct, so I'm interested in feedback and
discussion.

Thanks,
Otis

(Shay Banon) #2

Yes, sound reasoning, though, you should also take replicas into account. So, a 50 shard with 1 replica index will make use of 100 nodes.

On Tuesday, June 21, 2011 at 10:10 PM, Otis wrote:

Hello,

Is the following the correct (and complete!) thinking one should apply
when considering how many shards to specify when creating an index?

Set the number of shards to a high number if you think the number of
nodes in the cluster will grow over time. If you want to be able to
have a 100-node cluster, then create an index with at least 100
shards. Then if there is no shard replication, each node will have 1
shard. If number of shard replicas=1 then each node will have 2
shards - 1 primary shard and 1 replica of a shard whose primary node
is some other node in the cluster.

If the above is correct, then should one specify the number of index
shards to be equal to the eventual number of nodes in the cluster if
the number of replica is set to 0? More generally, # shards = # nodes
x # replicas?

(To keep things simple and to keep the focus on figuring out the
number of shards I'm ignoring other factors like RAM vs. shard/index
size, number of concurrent queries, query complexity, etc.)

The above may not be all correct, so I'm interested in feedback and
discussion.

Thanks,
Otis

(Otis GospodnetiÄ‡) #3

Right. I wanted to keep the number of replicas out of the discussion
to keep things simple.

So now the second part of my question:

Isn't there a slight and temporary "conflict" here around the number
of concurrent queries, the number of CPU cores per node, and the fixed
number of shards per node?

Let me give an example.

• Say I need to configure a cluster that I know will eventually need
to be big in terms of the number of nodes because it will have to hold
a lot of data.
Because of that I pick a large number of shards (following the logic
from my email below + taking into account replication, of course).

• Say the initial data volume is small and will only grow over time.
Because of that I build a cluster with initially a small number of
nodes.

• Because of the above, each node holds not one, but a number of
shards.

• Say that my cluster needs to handle a high query rate from day one.

Before I explain my thinking, please keep this is mind:

• Searching a single larger index on a system with 1 CPU core is
faster then searching N indices, each 1/N the size of the large index.

Now, because I have nodes that each hold a number of shards, under a
heavy-enough query load each node will need to handle multiple queries
concurrently. This is OK as long as the node has enough CPU cores to
handle all concurrent queries and no queries are waiting for CPU. But
what happens when the query load exceeds that? Isn't it the case that
then search is not as efficient as it would be with fewer larger
shards? I think the answer is yes.

So my point here is that because I'm forced to specify the number of
shards that will eventually be an optimal number of shards when my
data and the number of nodes grow, until I get there I may have a
smaller cluster with less data, but high query load that is not as
efficient as it could be if I could initially have fewer shards and
change (increase) that number over time as I add more nodes (i.e. CPU
cores) to the cluster.

I'd appreciate it if somebody could confirm this or point out flaws in
my thinking.

Thanks,
Otis

On Jun 23, 7:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, sound reasoning, though, you should also take replicas into account. So, a 50 shard with 1 replica index will make use of 100 nodes.

On Tuesday, June 21, 2011 at 10:10 PM, Otis wrote:

Hello,

Is the following the correct (and complete!) thinking one should apply
when considering how many shards to specify when creating an index?

Set the number of shards to a high number if you think the number of
nodes in the cluster will grow over time. If you want to be able to
have a 100-node cluster, then create an index with at least 100
shards. Then if there is no shard replication, each node will have 1
shard. If number of shard replicas=1 then each node will have 2
shards - 1 primary shard and 1 replica of a shard whose primary node
is some other node in the cluster.

If the above is correct, then should one specify the number of index
shards to be equal to the eventual number of nodes in the cluster if
the number of replica is set to 0? More generally, # shards = # nodes
x # replicas?

(To keep things simple and to keep the focus on figuring out the
number of shards I'm ignoring other factors like RAM vs. shard/index
size, number of concurrent queries, query complexity, etc.)

The above may not be all correct, so I'm interested in feedback and
discussion.

Thanks,
Otis

(Berkay Mollamustafaoglu-2) #4

One option may be using your own sharding logic rather than letting ES
automatically shard the data. For example, for log data, you can shard data
based on day, week, etc. And when you're searching, you can add some logic
to send the query only to shards that have data.

I realize that not all data can be sharded this way, but in many cases it
seems to be possible and may be preferable to auto sharding, so it should be
part of the design process.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Jun 23, 2011 at 11:43 AM, Otis otis.gospodnetic@gmail.com wrote:

Right. I wanted to keep the number of replicas out of the discussion
to keep things simple.

So now the second part of my question:

Isn't there a slight and temporary "conflict" here around the number
of concurrent queries, the number of CPU cores per node, and the fixed
number of shards per node?

Let me give an example.

• Say I need to configure a cluster that I know will eventually need
to be big in terms of the number of nodes because it will have to hold
a lot of data.
Because of that I pick a large number of shards (following the logic
from my email below + taking into account replication, of course).

• Say the initial data volume is small and will only grow over time.
Because of that I build a cluster with initially a small number of
nodes.

• Because of the above, each node holds not one, but a number of
shards.

• Say that my cluster needs to handle a high query rate from day one.

Before I explain my thinking, please keep this is mind:

• Searching a single larger index on a system with 1 CPU core is
faster then searching N indices, each 1/N the size of the large index.

Now, because I have nodes that each hold a number of shards, under a
heavy-enough query load each node will need to handle multiple queries
concurrently. This is OK as long as the node has enough CPU cores to
handle all concurrent queries and no queries are waiting for CPU. But
what happens when the query load exceeds that? Isn't it the case that
then search is not as efficient as it would be with fewer larger
shards? I think the answer is yes.

So my point here is that because I'm forced to specify the number of
shards that will eventually be an optimal number of shards when my
data and the number of nodes grow, until I get there I may have a
smaller cluster with less data, but high query load that is not as
efficient as it could be if I could initially have fewer shards and
change (increase) that number over time as I add more nodes (i.e. CPU
cores) to the cluster.

I'd appreciate it if somebody could confirm this or point out flaws in
my thinking.

Thanks,
Otis

On Jun 23, 7:41 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, sound reasoning, though, you should also take replicas into account.
So, a 50 shard with 1 replica index will make use of 100 nodes.

On Tuesday, June 21, 2011 at 10:10 PM, Otis wrote:

Hello,

Is the following the correct (and complete!) thinking one should apply
when considering how many shards to specify when creating an index?

Set the number of shards to a high number if you think the number of
nodes in the cluster will grow over time. If you want to be able to
have a 100-node cluster, then create an index with at least 100
shards. Then if there is no shard replication, each node will have 1
shard. If number of shard replicas=1 then each node will have 2
shards - 1 primary shard and 1 replica of a shard whose primary node
is some other node in the cluster.

If the above is correct, then should one specify the number of index
shards to be equal to the eventual number of nodes in the cluster if
the number of replica is set to 0? More generally, # shards = # nodes
x # replicas?

(To keep things simple and to keep the focus on figuring out the
number of shards I'm ignoring other factors like RAM vs. shard/index
size, number of concurrent queries, query complexity, etc.)

The above may not be all correct, so I'm interested in feedback and
discussion.

Thanks,
Otis

(Clinton Gormley) #5

Hi Otis

Now, because I have nodes that each hold a number of shards, under a
heavy-enough query load each node will need to handle multiple queries
concurrently. This is OK as long as the node has enough CPU cores to
handle all concurrent queries and no queries are waiting for CPU. But
what happens when the query load exceeds that? Isn't it the case that
then search is not as efficient as it would be with fewer larger
shards? I think the answer is yes.

So my point here is that because I'm forced to specify the number of
shards that will eventually be an optimal number of shards when my
data and the number of nodes grow, until I get there I may have a
smaller cluster with less data, but high query load that is not as
efficient as it could be if I could initially have fewer shards and
change (increase) that number over time as I add more nodes (i.e. CPU
cores) to the cluster.

Don't forget that you don't have to fix in stone now what your cluster
may look like in 2 years time.

There are two options to allow you to migrate as you go:

1. Reindex

This is the most basic. Start with 5 nodes. In a year's time, you
realise you need 10 nodes. Reindex to a new index.

Instead of referring to your index by its actual name, create an alias
so that you can reindex in the background, then switch the alias to
point to the new index,

1. Aliases

You can create one alias for writing "index_write" which points to the

That way you can keep adding indices/shards without having to reindex

clint

(Shay Banon) #6

Hi Otis,

First, to correct a wrong assumption:

• Searching a single larger index on a system with 1 CPU core is
faster then searching N indices, each 1/N the size of the large index
Thats not always the case. When searching, IO plays a big role as well.

Back to the question, it basically depends on the type of data, or, more specifically, how your data grows.

A case where you start with 2 nodes and then need to grow to 100 is quite extreme (or, at least, sounds like it). This is a huge change in data size, which affects many aspects (besides ES). And, starting with 20 and growing to 100 makes the side affect of having 100 shards less relevant.

Next, there are strategies to handle expansion of data. One example is time based data (like logs). You can easily create an index per time frame (week, month) and use aliases to hide the fact. The index itself can have shards appropriate to the size of data collected during that time period, and can easily be adapted (based on historic stats on data collected).

Another is "user base data". One option is to create that 100 shards index even on 2 nodes, but use aliases, routing, and filtering to handle it. You can use the username as the routing value when indexing and searching, which means you will hit a single shard, and filtering to filter out docs that don't belong to a user. This is made simpler with the aliases enhancements done in master, where one can associate both a filter and a routing value against an alias (so the alias can be the username for example, with a routing as the username, and a filter based on the username, hitting that username alias will take care of things automatically for you).

Last, as clinton suggested, reindexing when needed is also possible.

-shay.banon

On Thursday, June 23, 2011 at 6:43 PM, Otis wrote:

Right. I wanted to keep the number of replicas out of the discussion
to keep things simple.

So now the second part of my question:

Isn't there a slight and temporary "conflict" here around the number
of concurrent queries, the number of CPU cores per node, and the fixed
number of shards per node?

Let me give an example.

• Say I need to configure a cluster that I know will eventually need
to be big in terms of the number of nodes because it will have to hold
a lot of data.
Because of that I pick a large number of shards (following the logic
from my email below + taking into account replication, of course).

• Say the initial data volume is small and will only grow over time.
Because of that I build a cluster with initially a small number of
nodes.

• Because of the above, each node holds not one, but a number of
shards.

• Say that my cluster needs to handle a high query rate from day one.

Before I explain my thinking, please keep this is mind:

• Searching a single larger index on a system with 1 CPU core is
faster then searching N indices, each 1/N the size of the large index.

Now, because I have nodes that each hold a number of shards, under a
heavy-enough query load each node will need to handle multiple queries
concurrently. This is OK as long as the node has enough CPU cores to
handle all concurrent queries and no queries are waiting for CPU. But
what happens when the query load exceeds that? Isn't it the case that
then search is not as efficient as it would be with fewer larger
shards? I think the answer is yes.

So my point here is that because I'm forced to specify the number of
shards that will eventually be an optimal number of shards when my
data and the number of nodes grow, until I get there I may have a
smaller cluster with less data, but high query load that is not as
efficient as it could be if I could initially have fewer shards and
change (increase) that number over time as I add more nodes (i.e. CPU
cores) to the cluster.

I'd appreciate it if somebody could confirm this or point out flaws in
my thinking.

Thanks,
Otis

On Jun 23, 7:41 am, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Yes, sound reasoning, though, you should also take replicas into account. So, a 50 shard with 1 replica index will make use of 100 nodes.

On Tuesday, June 21, 2011 at 10:10 PM, Otis wrote:

Hello,

Is the following the correct (and complete!) thinking one should apply
when considering how many shards to specify when creating an index?

Set the number of shards to a high number if you think the number of
nodes in the cluster will grow over time. If you want to be able to
have a 100-node cluster, then create an index with at least 100
shards. Then if there is no shard replication, each node will have 1
shard. If number of shard replicas=1 then each node will have 2
shards - 1 primary shard and 1 replica of a shard whose primary node
is some other node in the cluster.

If the above is correct, then should one specify the number of index
shards to be equal to the eventual number of nodes in the cluster if
the number of replica is set to 0? More generally, # shards = # nodes
x # replicas?

(To keep things simple and to keep the focus on figuring out the
number of shards I'm ignoring other factors like RAM vs. shard/index
size, number of concurrent queries, query complexity, etc.)

The above may not be all correct, so I'm interested in feedback and
discussion.

Thanks,
Otis

(system) #7