ElasticSearch System Requirements

What are the system requirements for running ElasticSearch? I've had a bit
of trouble tracking down a specific number. In our dev/beta environments
we start all machines on a standard VM build on Centos 6.4 with 2gb ram and
20gb diskspace. We have installed LogStash to ship logs to ElasticSearch.
We had a couple of issues at one point on development with ElasticSearch
where we were maxing out our CPU and then disk space was going away
quickly. In looking into this issue we wanted to see if we had the
appropriate resources allocated to the machine to begin with and then
planning for the future we wanted some idea of what we should be running
ElasticSearch on.

As I have tried to gather information through various web sources I keep
hearing that ElasticSearch should be running with 64gb of ram per machine
and we should have a minimum of 3 machines. Our IT team felt this was very
high and and cost prohibitive, especially when we start deploying in the
cloud. I wanted to follow up and check if this is really the case. If it
is then I wanted to understand how long this resource would last before we
needed more ram/machines.

I understand it's all about the numbers. We are still in development so I
do not have hard numbers but can provide some close estimates based on data
we are currently see. When we go live we anticipate collecting logs with a
daily log size of 5gb-6gb in size from about 50 different servers. At this
time our product owner wants to be able to search logs for up to 3 years.
This is still something we are trying to evaluate and determine if there is
a solution we can use to start offloading some of the data in increments
and provide a secondary path to this data if needed. So ideally we would
have 3 years of searchable logs. Ultimately I am trying to determine what
resources we need to make ElasticSearch effective and if we can accommodate
these requirements. Then from there determine volume and history of logs
we would be able to store/search to see if this solution is going to work
for our team.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It's really hard to tell and it seems that you have many constraints here:

  • budget "Our IT team felt this was very high and and cost prohibitive"
  • history "Our product owner wants to be able to search logs for up to 3 years"

I feel like you will not be able to satisfy all these constraints.

I would start to test with a machine your IT team agrees to give you and start to inject as many logs as you can in this instance in a single index with a single shard and measure how long queries take.
At some point, you won't satisfy anymore your product owner's requirements. It's basically the number of documents a shard can contain.

Then add a new index on the same machine, add as many documents and see how search is running. If ok, add new index and so on…

This is the number of shards a single machine can contain given RAM, CPU you have.

Do you want replica? If so, you will need two machines at least because you will have two times more shards to manage.

Let's say you can now hold only 2 weeks of data. What's now? Are you going to ask for a bigger budget? Or are you going to relax requirements?

5gb per day is 5.5Tb of data for 3 years. Let's say it's somehow 11Tb with 1 replica.
You can start with less memory than 64gb. In that case, I think you will need more nodes.

Sorry if I'm not able to give you an actual number but really I think you need to test your different scenarios. The cool thing with the cloud is that you are able to scale out really easily and test that.

My 0.05 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 16:12:14, Chris Sham (cbs1918@gmail.com) a écrit:

What are the system requirements for running ElasticSearch? I've had a bit of trouble tracking down a specific number. In our dev/beta environments we start all machines on a standard VM build on Centos 6.4 with 2gb ram and 20gb diskspace. We have installed LogStash to ship logs to ElasticSearch. We had a couple of issues at one point on development with ElasticSearch where we were maxing out our CPU and then disk space was going away quickly. In looking into this issue we wanted to see if we had the appropriate resources allocated to the machine to begin with and then planning for the future we wanted some idea of what we should be running ElasticSearch on.

As I have tried to gather information through various web sources I keep hearing that ElasticSearch should be running with 64gb of ram per machine and we should have a minimum of 3 machines. Our IT team felt this was very high and and cost prohibitive, especially when we start deploying in the cloud. I wanted to follow up and check if this is really the case. If it is then I wanted to understand how long this resource would last before we needed more ram/machines.

I understand it's all about the numbers. We are still in development so I do not have hard numbers but can provide some close estimates based on data we are currently see. When we go live we anticipate collecting logs with a daily log size of 5gb-6gb in size from about 50 different servers. At this time our product owner wants to be able to search logs for up to 3 years. This is still something we are trying to evaluate and determine if there is a solution we can use to start offloading some of the data in increments and provide a secondary path to this data if needed. So ideally we would have 3 years of searchable logs. Ultimately I am trying to determine what resources we need to make ElasticSearch effective and if we can accommodate these requirements. Then from there determine volume and history of logs we would be able to store/search to see if this solution is going to work for our team.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52af1ced.3a95f874.6956%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for you feedback. I understand your concerns with my
'constraints'. I am trying to access what resources are necessary to meet
the product owner's goal. Currently in our development environment when we
start a new server we are given a virtual machine with 20gb of diskspace
and 2gb of ram. Asking for a machine with 4gb of ram versus 64gb is a much
easier task but I want to make sure that I am asking for an appropriate
amount of resources to provide an environment where I can get fair results
in testing. Is testing how much one machine will be able to handle with
2gb of ram sufficient or is there a recommended minimum amount I should
start with to get a fair test?

On Monday, December 16, 2013 10:31:57 AM UTC-5, David Pilato wrote:

It's really hard to tell and it seems that you have many constraints here:

  • budget "Our IT team felt this was very high and and cost prohibitive"
  • history "Our product owner wants to be able to search logs for up to 3
    years"

I feel like you will not be able to satisfy all these constraints.

I would start to test with a machine your IT team agrees to give you and
start to inject as many logs as you can in this instance in a single index
with a single shard and measure how long queries take.
At some point, you won't satisfy anymore your product owner's
requirements. It's basically the number of documents a shard can contain.

Then add a new index on the same machine, add as many documents and see
how search is running. If ok, add new index and so on…

This is the number of shards a single machine can contain given RAM, CPU
you have.

Do you want replica? If so, you will need two machines at least because
you will have two times more shards to manage.

Let's say you can now hold only 2 weeks of data. What's now? Are you going
to ask for a bigger budget? Or are you going to relax requirements?

5gb per day is 5.5Tb of data for 3 years. Let's say it's somehow 11Tb with
1 replica.
You can start with less memory than 64gb. In that case, I think you will
need more nodes.

Sorry if I'm not able to give you an actual number but really I think you
need to test your different scenarios. The cool thing with the cloud is
that you are able to scale out really easily and test that.

My 0.05 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 16 décembre 2013 at 16:12:14, Chris Sham (cbs...@gmail.com<javascript:>)
a écrit:

What are the system requirements for running Elasticsearch? I've had a
bit of trouble tracking down a specific number. In our dev/beta
environments we start all machines on a standard VM build on Centos 6.4
with 2gb ram and 20gb diskspace. We have installed LogStash to ship logs
to Elasticsearch. We had a couple of issues at one point on development
with Elasticsearch where we were maxing out our CPU and then disk space was
going away quickly. In looking into this issue we wanted to see if we had
the appropriate resources allocated to the machine to begin with and then
planning for the future we wanted some idea of what we should be running
Elasticsearch on.

As I have tried to gather information through various web sources I keep
hearing that Elasticsearch should be running with 64gb of ram per machine
and we should have a minimum of 3 machines. Our IT team felt this was very
high and and cost prohibitive, especially when we start deploying in the
cloud. I wanted to follow up and check if this is really the case. If it
is then I wanted to understand how long this resource would last before we
needed more ram/machines.

I understand it's all about the numbers. We are still in development so I
do not have hard numbers but can provide some close estimates based on data
we are currently see. When we go live we anticipate collecting logs with a
daily log size of 5gb-6gb in size from about 50 different servers. At this
time our product owner wants to be able to search logs for up to 3 years.
This is still something we are trying to evaluate and determine if there is
a solution we can use to start offloading some of the data in increments
and provide a secondary path to this data if needed. So ideally we would
have 3 years of searchable logs. Ultimately I am trying to determine what
resources we need to make Elasticsearch effective and if we can accommodate
these requirements. Then from there determine volume and history of logs
we would be able to store/search to see if this solution is going to work
for our team.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bfcfc79-c938-49dc-ad63-fa71649cb425%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

2gb RAM means probably only 1gb of HEAP size which is really small for a production usage IMHO.
On AWS we often recommend to start with m1.xlarge to avoid noisy neighbors. So it comes with a default of 15 gb RAM, so 7 gb of heap.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 16:46:48, Chris Sham (cbs1918@gmail.com) a écrit:

Thank you for you feedback. I understand your concerns with my 'constraints'. I am trying to access what resources are necessary to meet the product owner's goal. Currently in our development environment when we start a new server we are given a virtual machine with 20gb of diskspace and 2gb of ram. Asking for a machine with 4gb of ram versus 64gb is a much easier task but I want to make sure that I am asking for an appropriate amount of resources to provide an environment where I can get fair results in testing. Is testing how much one machine will be able to handle with 2gb of ram sufficient or is there a recommended minimum amount I should start with to get a fair test?

On Monday, December 16, 2013 10:31:57 AM UTC-5, David Pilato wrote:
It's really hard to tell and it seems that you have many constraints here:

  • budget "Our IT team felt this was very high and and cost prohibitive"
  • history "Our product owner wants to be able to search logs for up to 3 years"

I feel like you will not be able to satisfy all these constraints.

I would start to test with a machine your IT team agrees to give you and start to inject as many logs as you can in this instance in a single index with a single shard and measure how long queries take.
At some point, you won't satisfy anymore your product owner's requirements. It's basically the number of documents a shard can contain.

Then add a new index on the same machine, add as many documents and see how search is running. If ok, add new index and so on…

This is the number of shards a single machine can contain given RAM, CPU you have.

Do you want replica? If so, you will need two machines at least because you will have two times more shards to manage.

Let's say you can now hold only 2 weeks of data. What's now? Are you going to ask for a bigger budget? Or are you going to relax requirements?

5gb per day is 5.5Tb of data for 3 years. Let's say it's somehow 11Tb with 1 replica.
You can start with less memory than 64gb. In that case, I think you will need more nodes.

Sorry if I'm not able to give you an actual number but really I think you need to test your different scenarios. The cool thing with the cloud is that you are able to scale out really easily and test that.

My 0.05 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 16:12:14, Chris Sham (cbs...@gmail.com) a écrit:

What are the system requirements for running ElasticSearch? I've had a bit of trouble tracking down a specific number. In our dev/beta environments we start all machines on a standard VM build on Centos 6.4 with 2gb ram and 20gb diskspace. We have installed LogStash to ship logs to ElasticSearch. We had a couple of issues at one point on development with ElasticSearch where we were maxing out our CPU and then disk space was going away quickly. In looking into this issue we wanted to see if we had the appropriate resources allocated to the machine to begin with and then planning for the future we wanted some idea of what we should be running ElasticSearch on.

As I have tried to gather information through various web sources I keep hearing that ElasticSearch should be running with 64gb of ram per machine and we should have a minimum of 3 machines. Our IT team felt this was very high and and cost prohibitive, especially when we start deploying in the cloud. I wanted to follow up and check if this is really the case. If it is then I wanted to understand how long this resource would last before we needed more ram/machines.

I understand it's all about the numbers. We are still in development so I do not have hard numbers but can provide some close estimates based on data we are currently see. When we go live we anticipate collecting logs with a daily log size of 5gb-6gb in size from about 50 different servers. At this time our product owner wants to be able to search logs for up to 3 years. This is still something we are trying to evaluate and determine if there is a solution we can use to start offloading some of the data in increments and provide a secondary path to this data if needed. So ideally we would have 3 years of searchable logs. Ultimately I am trying to determine what resources we need to make ElasticSearch effective and if we can accommodate these requirements. Then from there determine volume and history of logs we would be able to store/search to see if this solution is going to work for our team.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bfcfc79-c938-49dc-ad63-fa71649cb425%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52af2211.737b8ddc.6956%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, Dec 16, 2013 at 7:53 AM, David Pilato david@pilato.fr wrote:

2gb RAM means probably only 1gb of HEAP size which is really small for a
production usage IMHO.

But it's all about the use case -- I have a production system running
a single elasticsearch node with -Xmx512m (on a 1Gb box) and it's
working fine.

Needless to say, my dataset is not in the terabyte range, nor growing
5Gb/day :slight_smile:

FWIW,

Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
Hassan Schroeder | about.me
twitter: @hassan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACmC4yCygaGqZzsp9ULu_GQej0NHPuAnV4ve4h%2BPVSbdPXOSHg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Right!

I'm playing demos with 1m documents injected in a minute with Kibana on top of that with default elasticsearch settings!

It really depends on the use case! :smiley:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 17:09:19, Hassan Schroeder (hassan.schroeder@gmail.com) a écrit:

On Mon, Dec 16, 2013 at 7:53 AM, David Pilato david@pilato.fr wrote:

2gb RAM means probably only 1gb of HEAP size which is really small for a
production usage IMHO.

But it's all about the use case -- I have a production system running
a single elasticsearch node with -Xmx512m (on a 1Gb box) and it's
working fine.

Needless to say, my dataset is not in the terabyte range, nor growing
5Gb/day :slight_smile:

FWIW,

Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
Hassan Schroeder | about.me
twitter: @hassan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACmC4yCygaGqZzsp9ULu_GQej0NHPuAnV4ve4h%2BPVSbdPXOSHg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52af2857.3006c83e.6956%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

If you want reasonable resource for faceting and field caching, 2GB RAM is
very tight.

It will work but you will have to consider running many nodes, also to
distribute the data input of the 50 servers. So you say "search logs for up
to 3 years". This sounds like simple search without faceting, just term and
word matching. With faceting, you have to plan with larger heaps, 4-8G.

But 5-6GB per day is >5TB data input size for 3 years! That is an enormous
amount. Prepare for a very large ES cluster. I can not give you exact
advise, but keep an eye on the growing capacity.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH7vNhK8QhBP_qnePYf-Vax8sQcPK8ACiqG13%2BeZvfNoA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.