Optimal number of Shards per node


(Rajan Bhatt) #1

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57774626-f484-48c6-9b84-408df9ced896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Zachary Tong) #2

Unfortunately, there is no way that we can tell you an optimal number. But
there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

  • Create a single index, with a single shard, on a single
    production-style machine
  • Start indexing *real, production-style *data. "Fake" or "dummy" data
    won't work here, it needs to mimic real-world data
  • Periodically, run real-world queries that you would expect users to
    enter
  • At some point, you'll find that performance is no longer acceptable to
    you. Perhaps the indexing rate becomes too slow. Or perhaps query latency
    is too slow. Or perhaps your node just runs out of memory
  • Write down the number of documents in the shard, and the physical size
    of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94bd5180-1198-4cfd-9b3d-f532d3fea5d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Rajan Bhatt) #3

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

  • Create a single index, with a single shard, on a single
    production-style machine
  • Start indexing *real, production-style *data. "Fake" or "dummy"
    data won't work here, it needs to mimic real-world data
  • Periodically, run real-world queries that you would expect users to
    enter
  • At some point, you'll find that performance is no longer acceptable
    to you. Perhaps the indexing rate becomes too slow. Or perhaps query
    latency is too slow. Or perhaps your node just runs out of memory
  • Write down the number of documents in the shard, and the physical
    size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per ES
node for optimal performance or any recommended way to arrive at number of
shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

ElasticHQ, Marvel, bigdesk and kopf are some of the better monitoring
plugins.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 22 March 2014 03:56, Rajan Bhatt rajanbhatt@sbcglobal.net wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

  • Create a single index, with a single shard, on a single
    production-style machine
  • Start indexing *real, production-style *data. "Fake" or "dummy"
    data won't work here, it needs to mimic real-world data
  • Periodically, run real-world queries that you would expect users to
    enter
  • At some point, you'll find that performance is no longer acceptable
    to you. Perhaps the indexing rate becomes too slow. Or perhaps query
    latency is too slow. Or perhaps your node just runs out of memory
  • Write down the number of documents in the shard, and the physical
    size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/658c8f7d-071b-46c8-b80b-3d0660e7889e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZY%3Ddfd9sUqPdFWcU%3DuSsjb9ny2rS_HwX3_j_cg0_d71w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Otis Gospodnetić) #5

Hi Rajan,

http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/health.html

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

tel: +1 347 480 1610 fax: +1 718 679 9190

On Friday, March 21, 2014 12:56:56 PM UTC-4, Rajan Bhatt wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

  • Create a single index, with a single shard, on a single
    production-style machine
  • Start indexing *real, production-style *data. "Fake" or "dummy"
    data won't work here, it needs to mimic real-world data
  • Periodically, run real-world queries that you would expect users to
    enter
  • At some point, you'll find that performance is no longer acceptable
    to you. Perhaps the indexing rate becomes too slow. Or perhaps query
    latency is too slow. Or perhaps your node just runs out of memory
  • Write down the number of documents in the shard, and the physical
    size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac602fb7-74f4-49ad-b7e9-c2f5efb8130d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Zachary Tong) #6

Well, the only reason to have more than one shard per node is to support
eventual growth/scaling. There isn't a good reason to have more than one
shard per node other than to plan for future growth. If your data is
fixed and not growing, you should plan a single shard per server from the
start because it is more efficient. That's a generalization, but true more
often than not.

The idea is to capacity plan how much a single shard is capable of
holding/servicing so that you can use it to extrapolate. E.g. "In six
months, we expect to have 500m docs and 1000 queries per second". Based on
the test I outlined, you find out that it will require 30 servers to meet
your SLA. But today you only have 30m docs, so you can get away with 3
servers and just add new nodes as your data increases. It's a way to take
guesswork out of scaling.

Obviously more shards per node will impact performance (some), but that's
why you plan to add more nodes as performance lags.

Hope that helps!
-Zach

PS. I personally use Marvel, but the other plugins/APIs mentioned are very
good too.

On Friday, March 21, 2014 12:56:56 PM UTC-4, Rajan Bhatt wrote:

Thanks Zack.

So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.

This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..

I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.

btw,which tool you use for monitoring ES cluster and what you monitor ?
Thanks
Rajan

On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:

Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:

  • Create a single index, with a single shard, on a single
    production-style machine
  • Start indexing *real, production-style *data. "Fake" or "dummy"
    data won't work here, it needs to mimic real-world data
  • Periodically, run real-world queries that you would expect users to
    enter
  • At some point, you'll find that performance is no longer acceptable
    to you. Perhaps the indexing rate becomes too slow. Or perhaps query
    latency is too slow. Or perhaps your node just runs out of memory
  • Write down the number of documents in the shard, and the physical
    size of the shard

Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.

-Zach

On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:

Hello,

I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.

Thanks in advance
Reagards
Rajan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaebecc-009e-4cd1-be7a-80b3b28b1cc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7