ElasticSearch System Requirements

Chris_Sham · December 16, 2013, 3:12pm

What are the system requirements for running ElasticSearch? I've had a bit
of trouble tracking down a specific number. In our dev/beta environments
we start all machines on a standard VM build on Centos 6.4 with 2gb ram and
20gb diskspace. We have installed LogStash to ship logs to ElasticSearch.
We had a couple of issues at one point on development with ElasticSearch
where we were maxing out our CPU and then disk space was going away
quickly. In looking into this issue we wanted to see if we had the
appropriate resources allocated to the machine to begin with and then
planning for the future we wanted some idea of what we should be running
ElasticSearch on.

As I have tried to gather information through various web sources I keep
hearing that ElasticSearch should be running with 64gb of ram per machine
and we should have a minimum of 3 machines. Our IT team felt this was very
high and and cost prohibitive, especially when we start deploying in the
cloud. I wanted to follow up and check if this is really the case. If it
is then I wanted to understand how long this resource would last before we
needed more ram/machines.

I understand it's all about the numbers. We are still in development so I
do not have hard numbers but can provide some close estimates based on data
we are currently see. When we go live we anticipate collecting logs with a
daily log size of 5gb-6gb in size from about 50 different servers. At this
time our product owner wants to be able to search logs for up to 3 years.
This is still something we are trying to evaluate and determine if there is
a solution we can use to start offloading some of the data in increments
and provide a secondary path to this data if needed. So ideally we would
have 3 years of searchable logs. Ultimately I am trying to determine what
resources we need to make ElasticSearch effective and if we can accommodate
these requirements. Then from there determine volume and history of logs
we would be able to store/search to see if this solution is going to work
for our team.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · December 16, 2013, 3:31pm

It's really hard to tell and it seems that you have many constraints here:

budget "Our IT team felt this was very high and and cost prohibitive"
history "Our product owner wants to be able to search logs for up to 3 years"

I feel like you will not be able to satisfy all these constraints.

I would start to test with a machine your IT team agrees to give you and start to inject as many logs as you can in this instance in a single index with a single shard and measure how long queries take.
At some point, you won't satisfy anymore your product owner's requirements. It's basically the number of documents a shard can contain.

Then add a new index on the same machine, add as many documents and see how search is running. If ok, add new index and so on…

This is the number of shards a single machine can contain given RAM, CPU you have.

Do you want replica? If so, you will need two machines at least because you will have two times more shards to manage.

Let's say you can now hold only 2 weeks of data. What's now? Are you going to ask for a bigger budget? Or are you going to relax requirements?

5gb per day is 5.5Tb of data for 3 years. Let's say it's somehow 11Tb with 1 replica.
You can start with less memory than 64gb. In that case, I think you will need more nodes.

Sorry if I'm not able to give you an actual number but really I think you need to test your different scenarios. The cool thing with the cloud is that you are able to scale out really easily and test that.

My 0.05 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 16:12:14, Chris Sham (cbs1918@gmail.com) a écrit:

What are the system requirements for running ElasticSearch? I've had a bit of trouble tracking down a specific number. In our dev/beta environments we start all machines on a standard VM build on Centos 6.4 with 2gb ram and 20gb diskspace. We have installed LogStash to ship logs to ElasticSearch. We had a couple of issues at one point on development with ElasticSearch where we were maxing out our CPU and then disk space was going away quickly. In looking into this issue we wanted to see if we had the appropriate resources allocated to the machine to begin with and then planning for the future we wanted some idea of what we should be running ElasticSearch on.

As I have tried to gather information through various web sources I keep hearing that ElasticSearch should be running with 64gb of ram per machine and we should have a minimum of 3 machines. Our IT team felt this was very high and and cost prohibitive, especially when we start deploying in the cloud. I wanted to follow up and check if this is really the case. If it is then I wanted to understand how long this resource would last before we needed more ram/machines.

I understand it's all about the numbers. We are still in development so I do not have hard numbers but can provide some close estimates based on data we are currently see. When we go live we anticipate collecting logs with a daily log size of 5gb-6gb in size from about 50 different servers. At this time our product owner wants to be able to search logs for up to 3 years. This is still something we are trying to evaluate and determine if there is a solution we can use to start offloading some of the data in increments and provide a secondary path to this data if needed. So ideally we would have 3 years of searchable logs. Ultimately I am trying to determine what resources we need to make ElasticSearch effective and if we can accommodate these requirements. Then from there determine volume and history of logs we would be able to store/search to see if this solution is going to work for our team.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52af1ced.3a95f874.6956%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Chris_Sham · December 16, 2013, 3:46pm

Thank you for you feedback. I understand your concerns with my
'constraints'. I am trying to access what resources are necessary to meet
the product owner's goal. Currently in our development environment when we
start a new server we are given a virtual machine with 20gb of diskspace
and 2gb of ram. Asking for a machine with 4gb of ram versus 64gb is a much
easier task but I want to make sure that I am asking for an appropriate
amount of resources to provide an environment where I can get fair results
in testing. Is testing how much one machine will be able to handle with
2gb of ram sufficient or is there a recommended minimum amount I should
start with to get a fair test?

On Monday, December 16, 2013 10:31:57 AM UTC-5, David Pilato wrote:

It's really hard to tell and it seems that you have many constraints here:

budget "Our IT team felt this was very high and and cost prohibitive"

history "Our product owner wants to be able to search logs for up to 3
years"

I feel like you will not be able to satisfy all these constraints.

I would start to test with a machine your IT team agrees to give you and
start to inject as many logs as you can in this instance in a single index
with a single shard and measure how long queries take.
At some point, you won't satisfy anymore your product owner's
requirements. It's basically the number of documents a shard can contain.

Then add a new index on the same machine, add as many documents and see
how search is running. If ok, add new index and so on…

This is the number of shards a single machine can contain given RAM, CPU
you have.

Do you want replica? If so, you will need two machines at least because
you will have two times more shards to manage.

Let's say you can now hold only 2 weeks of data. What's now? Are you going
to ask for a bigger budget? Or are you going to relax requirements?

5gb per day is 5.5Tb of data for 3 years. Let's say it's somehow 11Tb with
1 replica.
You can start with less memory than 64gb. In that case, I think you will
need more nodes.

Sorry if I'm not able to give you an actual number but really I think you
need to test your different scenarios. The cool thing with the cloud is
that you are able to scale out really easily and test that.

My 0.05 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 16 décembre 2013 at 16:12:14, Chris Sham (cbs...@gmail.com<javascript:>)
a écrit:

What are the system requirements for running Elasticsearch? I've had a
bit of trouble tracking down a specific number. In our dev/beta
environments we start all machines on a standard VM build on Centos 6.4
with 2gb ram and 20gb diskspace. We have installed LogStash to ship logs
to Elasticsearch. We had a couple of issues at one point on development
with Elasticsearch where we were maxing out our CPU and then disk space was
going away quickly. In looking into this issue we wanted to see if we had
the appropriate resources allocated to the machine to begin with and then
planning for the future we wanted some idea of what we should be running
Elasticsearch on.

As I have tried to gather information through various web sources I keep
hearing that Elasticsearch should be running with 64gb of ram per machine
and we should have a minimum of 3 machines. Our IT team felt this was very
high and and cost prohibitive, especially when we start deploying in the
cloud. I wanted to follow up and check if this is really the case. If it
is then I wanted to understand how long this resource would last before we
needed more ram/machines.

I understand it's all about the numbers. We are still in development so I
do not have hard numbers but can provide some close estimates based on data
we are currently see. When we go live we anticipate collecting logs with a
daily log size of 5gb-6gb in size from about 50 different servers. At this
time our product owner wants to be able to search logs for up to 3 years.
This is still something we are trying to evaluate and determine if there is
a solution we can use to start offloading some of the data in increments
and provide a secondary path to this data if needed. So ideally we would
have 3 years of searchable logs. Ultimately I am trying to determine what
resources we need to make Elasticsearch effective and if we can accommodate
these requirements. Then from there determine volume and history of logs
we would be able to store/search to see if this solution is going to work
for our team.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b6ea191c-4fb2-431a-8d67-1854fd78a84e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bfcfc79-c938-49dc-ad63-fa71649cb425%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · December 16, 2013, 3:53pm

2gb RAM means probably only 1gb of HEAP size which is really small for a production usage IMHO.
On AWS we often recommend to start with m1.xlarge to avoid noisy neighbors. So it comes with a default of 15 gb RAM, so 7 gb of heap.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 16 décembre 2013 at 16:46:48, Chris Sham (cbs1918@gmail.com) a écrit:

Thank you for you feedback. I understand your concerns with my 'constraints'. I am trying to access what resources are necessary to meet the product owner's goal. Currently in our development environment when we start a new server we are given a virtual machine with 20gb of diskspace and 2gb of ram. Asking for a machine with 4gb of ram versus 64gb is a much easier task but I want to make sure that I am asking for an appropriate amount of resources to provide an environment where I can get fair results in testing. Is testing how much one machine will be able to handle with 2gb of ram sufficient or is there a recommended minimum amount I should start with to get a fair test?

On Monday, December 16, 2013 10:31:57 AM UTC-5, David Pilato wrote:
It's really hard to tell and it seems that you have many constraints here:

budget "Our IT team felt this was very high and and cost prohibitive"
history "Our product owner wants to be able to search logs for up to 3 years"