Elasticsearch with large amount of data

Hi,

I have 5 nodes of ElasticSearch with 4 CPUs, 8 Mbs of RAM.

My Index today have 1TB of data and my index have about 100GBs By day and i
configure 3 primary shards and 1 replica but my elasticsearch gets
OutOfMemoy in every two days.

There is some configuration to resolve this problem?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c018f48-d760-43a3-9878-e3608a113d1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

First going to assume you mean 8GBs of memory or I am very impressed that
Elasticsearch runs at all.

Second, when are you running out of memory?
Do you run out of memory while indexing?
Is it a specific document when indexing?
Do you run out of memory when searching?
Is it a specific search when searching?
What type of search, sort, filter?
How many documents do you index each day
What is the largest document?
What is the average document?
Are you indexing in batches?
How big are your batches?
Of your 8 gb how much is allocated to Elasticsearch?
How much is allocated to File System Cache?
(I usually start with 2 GB to the OS, and split the remaining ram
between Elasticsearch and FileSystem Cache. This means allocate 3GB to
Elasticsearch.

By a rough swag based on the very little info you have provided, I would
say that your cluster does not have enough ram for the level of data you
are trying to load into it. In general I have found that lucene indexes
like to be in memory. When they cannot performance is poor and operations
can fail. By indexing 100GBs of data a day, you are asking Elasticsearch
to store some pretty large segments for 8GB or memory (effectively 3GB of
ES).

From this page:

A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB
machines are also common. Less than 8 GB tends to be counterproductive (you
end up needing many, many small machines), and greater than 64 GB has
problems that we will discuss in Heap: Sizing and Swapping
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
.

Also review:

The default installation of Elasticsearch is configured with a 1 GB heap.
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#id-1.10.4.11.2.1
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#id-1.10.4.11.2.2
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#id-1.10.4.11.2.3
http://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#id-1.10.4.11.2.4
For just about every deployment, this number is far too small. If you are
using the default heap values, your cluster is probably configured
incorrectly.

I ran into similar problems with machines that had only 8GB or memory when
indexing. My data volume was lower than what you have indicated.
Upgrading to larger instances with 16GB resolved the issue and I have not
had a problem since. Of course I had tuned everything previously according
to what I outlined above. The 16 GB box means that instead of 3GB for ES
you have (16G-2G)/2= 7GB, more than double. In consulting engagements I
always recommend 16GB as a bare minimum, but 32GB as a realistic minimum.

This page also has some good info on it:

Aaron

On Thursday, March 12, 2015 at 6:12:11 PM UTC-6, Jeferson Martins wrote:

Hi,

I have 5 nodes of Elasticsearch with 4 CPUs, 8 Mbs of RAM.

My Index today have 1TB of data and my index have about 100GBs By day and
i configure 3 primary shards and 1 replica but my elasticsearch gets
OutOfMemoy in every two days.

There is some configuration to resolve this problem?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02979d4a-c24d-44c0-85a4-a34e01b7dc20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.