Optimal number of Shards per node

Rajan_Bhatt · March 20, 2014, 8:29pm

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57774626-f484-48c6-9b84-408df9ced896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

polyfractal · March 20, 2014, 9:05pm

Unfortunately, there is no way that we can tell you an optimal number. But
there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

Create a single index, with a single shard, on a single
production-style machine
Start indexing *real, production-style *data. "Fake" or "dummy" data
won't work here, it needs to mimic real-world data
Periodically, run real-world queries that you would expect users to
enter
At some point, you'll find that performance is no longer acceptable to
you. Perhaps the indexing rate becomes too slow. Or perhaps query latency
is too slow. Or perhaps your node just runs out of memory
Write down the number of documents in the shard, and the physical size
of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94bd5180-1198-4cfd-9b3d-f532d3fea5d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rajan_Bhatt · March 21, 2014, 4:56pm

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

Create a single index, with a single shard, on a single
production-style machine

Start indexing *real, production-style *data. "Fake" or "dummy"
data won't work here, it needs to mimic real-world data

Periodically, run real-world queries that you would expect users to
enter

At some point, you'll find that performance is no longer acceptable
to you. Perhaps the indexing rate becomes too slow. Or perhaps query
latency is too slow. Or perhaps your node just runs out of memory

Write down the number of documents in the shard, and the physical
size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 22, 2014, 12:47am

ElasticHQ, Marvel, bigdesk and kopf are some of the better monitoring
plugins.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 22 March 2014 03:56, Rajan Bhatt rajanbhatt@sbcglobal.net wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

Create a single index, with a single shard, on a single
production-style machine

Start indexing *real, production-style *data. "Fake" or "dummy"
data won't work here, it needs to mimic real-world data

Periodically, run real-world queries that you would expect users to
enter

At some point, you'll find that performance is no longer acceptable
to you. Perhaps the indexing rate becomes too slow. Or perhaps query
latency is too slow. Or perhaps your node just runs out of memory

Write down the number of documents in the shard, and the physical
size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZY%3Ddfd9sUqPdFWcU%3DuSsjb9ny2rS_HwX3_j_cg0_d71w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

otisg · March 25, 2014, 1:26am

Hi Rajan,

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

tel: +1 347 480 1610 fax: +1 718 679 9190

On Friday, March 21, 2014 12:56:56 PM UTC-4, Rajan Bhatt wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

Create a single index, with a single shard, on a single
production-style machine

Start indexing *real, production-style *data. "Fake" or "dummy"
data won't work here, it needs to mimic real-world data

Periodically, run real-world queries that you would expect users to
enter

At some point, you'll find that performance is no longer acceptable
to you. Perhaps the indexing rate becomes too slow. Or perhaps query
latency is too slow. Or perhaps your node just runs out of memory

Write down the number of documents in the shard, and the physical
size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac602fb7-74f4-49ad-b7e9-c2f5efb8130d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

polyfractal · March 25, 2014, 1:49am

Well, the only reason to have more than one shard per node is to support
eventual growth/scaling. There isn't a good reason to have more than one
shard per node other than to plan for future growth. If your data is
fixed and not growing, you should plan a single shard per server from the
start because it is more efficient. That's a generalization, but true more
often than not.

The idea is to capacity plan how much a single shard is capable of
holding/servicing so that you can use it to extrapolate. E.g. "In six
months, we expect to have 500m docs and 1000 queries per second". Based on
the test I outlined, you find out that it will require 30 servers to meet
your SLA. But today you only have 30m docs, so you can get away with 3
servers and just add new nodes as your data increases. It's a way to take
guesswork out of scaling.

Obviously more shards per node will impact performance (some), but that's
why you plan to add more nodes as performance lags.

Hope that helps!
-Zach

PS. I personally use Marvel, but the other plugins/APIs mentioned are very
good too.

On Friday, March 21, 2014 12:56:56 PM UTC-4, Rajan Bhatt wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

Create a single index, with a single shard, on a single
production-style machine

Start indexing *real, production-style *data. "Fake" or "dummy"
data won't work here, it needs to mimic real-world data

Periodically, run real-world queries that you would expect users to
enter

At some point, you'll find that performance is no longer acceptable
to you. Perhaps the indexing rate becomes too slow. Or perhaps query
latency is too slow. Or perhaps your node just runs out of memory

Write down the number of documents in the shard, and the physical
size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaebecc-009e-4cd1-be7a-80b3b28b1cc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Max number of shards per node Elasticsearch	6	3318	July 6, 2017
Optimal Number of Shards Elasticsearch	2	431	July 6, 2017
How to choose indexes Elasticsearch	4	1269	July 6, 2017
Set the number_of_shards Elasticsearch	2	332	July 6, 2017
Number of shards per node Elasticsearch	2	361	July 6, 2017

Optimal number of Shards per node

Otis

Related topics