Estimates of cluster size / hardware based on documents and size

Robin_Verlangen · November 24, 2012, 3:33pm

Hi there,

Does anyone here have estimates of the cluster size based on the amount of
documents of a certain size?

For example:

50.000.000 x 1024 bytes document (~ 47GB) requires X servers with X
hardware)
500.000.000 x 500 bytes document (~ 238GB) requires X servers with X
hardware)

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

Igor_Motov · November 25, 2012, 12:34am

It's very difficult to provide any estimates based on the number and size
of indexed documents. For example, on systems with heavy search traffic,
peak frequency of search requests and required latency might be the most
important factors determining required number of nodes in the cluster. It's
also important to consider what type of queries will be used because
different queries have very different memory requirements. The type of data
indexed might also have significant impact on the index size. There are
just too many factors that can have significant impact on memory, disk and
CPU requirements to give reasonable estimates based on information provided.

I am sure this is not the answer that you hoped to receive, but I would
suggest taking a significant subset of your data and trying to index and
search it using requests and load similar to what you anticipate to get in
production while watching index size, memory and CPU. Start with a couple
of small nodes and load them until you reach breaking point then scale your
cluster accordingly. Relying on any other estimates might be very
misleading.

On Saturday, November 24, 2012 10:34:03 AM UTC-5, Robin Verlangen wrote:

Hi there,

Does anyone here have estimates of the cluster size based on the amount of
documents of a certain size?

For example:

50.000.000 x 1024 bytes document (~ 47GB) requires X servers with X
hardware)

500.000.000 x 500 bytes document (~ 238GB) requires X servers with X
hardware)

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

Robin_Verlangen · November 25, 2012, 6:55pm

Hi Igor,

Thank you for your response. I actually did expect an answer like this.
However I hoped that document size + count would give a rough estimate.
However of course the amounts of searches are important, queries etc. All
makes sense.

I'll just go with testing it, that's the only reliable manner.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Sun, Nov 25, 2012 at 1:34 AM, Igor Motov imotov@gmail.com wrote:

It's very difficult to provide any estimates based on the number and size
of indexed documents. For example, on systems with heavy search traffic,
peak frequency of search requests and required latency might be the most
important factors determining required number of nodes in the cluster. It's
also important to consider what type of queries will be used because
different queries have very different memory requirements. The type of data
indexed might also have significant impact on the index size. There are
just too many factors that can have significant impact on memory, disk and
CPU requirements to give reasonable estimates based on information provided.

I am sure this is not the answer that you hoped to receive, but I would
suggest taking a significant subset of your data and trying to index and
search it using requests and load similar to what you anticipate to get in
production while watching index size, memory and CPU. Start with a couple
of small nodes and load them until you reach breaking point then scale your
cluster accordingly. Relying on any other estimates might be very
misleading.

On Saturday, November 24, 2012 10:34:03 AM UTC-5, Robin Verlangen wrote:

Hi there,

Does anyone here have estimates of the cluster size based on the amount
of documents of a certain size?

For example:

50.000.000 x 1024 bytes document (~ 47GB) requires X servers with X
hardware)

500.000.000 x 500 bytes document (~ 238GB) requires X servers with X
hardware)

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--

Hadar_Rottenberg · November 27, 2012, 9:56am

It would be great if you could post your benchmarking results and dataset
information.
If more people will do it we will have different benchmarks for different
use cases.

On Sunday, November 25, 2012 8:55:42 PM UTC+2, Robin Verlangen wrote:

Hi Igor,

Thank you for your response. I actually did expect an answer like this.
However I hoped that document size + count would give a rough estimate.
However of course the amounts of searches are important, queries etc. All
makes sense.

I'll just go with testing it, that's the only reliable manner.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Sun, Nov 25, 2012 at 1:34 AM, Igor Motov <imo...@gmail.com<javascript:>

wrote:

It's very difficult to provide any estimates based on the number and size
of indexed documents. For example, on systems with heavy search traffic,
peak frequency of search requests and required latency might be the most
important factors determining required number of nodes in the cluster. It's
also important to consider what type of queries will be used because
different queries have very different memory requirements. The type of data
indexed might also have significant impact on the index size. There are
just too many factors that can have significant impact on memory, disk and
CPU requirements to give reasonable estimates based on information provided.

I am sure this is not the answer that you hoped to receive, but I would
suggest taking a significant subset of your data and trying to index and
search it using requests and load similar to what you anticipate to get in
production while watching index size, memory and CPU. Start with a couple
of small nodes and load them until you reach breaking point then scale your
cluster accordingly. Relying on any other estimates might be very
misleading.

On Saturday, November 24, 2012 10:34:03 AM UTC-5, Robin Verlangen wrote:

Hi there,

Does anyone here have estimates of the cluster size based on the amount
of documents of a certain size?

For example:

50.000.000 x 1024 bytes document (~ 47GB) requires X servers with X
hardware)

500.000.000 x 500 bytes document (~ 238GB) requires X servers with X
hardware)

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--

Topic		Replies	Views
Determine Cluster Size Needed Elasticsearch	4	712	June 28, 2017
Elastic cluster hardware estimation Elasticsearch	3	544	May 30, 2018
10 billion records writen to ES ervery day,how many nodes and hardware should need? Elasticsearch	7	6808	July 5, 2017
What's the most up to date resource on sizing and configuring an ES cluster? Elasticsearch	2	368	March 18, 2020
Cluster Achitecture Elasticsearch	2	314	July 20, 2018

Estimates of cluster size / hardware based on documents and size

Related topics