Heap sizing

Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?

rule of thumb is: half for the jvm. half for this disc cache

Peter.

On 22 Jan., 22:54, Grant gr...@brewster.com wrote:

Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?

Hard to say without more data. You can check the memory usage behavior
using something like bigdesk.

On Sun, Jan 22, 2012 at 11:54 PM, Grant grant@brewster.com wrote:

Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?

Right now between half and 75% the heap allocation is actually being
used depending on what's going on. But our data set is much smaller
than it will eventually be (although obviously we'll add nodes as
required to balance ram requirements with data size).

If all the disk caching is done outside the jvm, at my present
allocation I've only got 1gb of free RAM... if I need to keep my data
set that small per node to ensure most of it resides in the disk cache
I think I'll need to reduce the heap.

Our data set is going to be comprised of a LOT of very small (in the
neighborhood of a few hundred kb to ~50mb at the top end) indices.
We're running with one (i.e. no) shards and 3 replicas per index.

On Jan 23, 2:01 pm, Shay Banon kim...@gmail.com wrote:

Hard to say without more data. You can check the memory usage behavior
using something like bigdesk.

On Sun, Jan 22, 2012 at 11:54 PM, Grant gr...@brewster.com wrote:

Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?

Disk cache is there to help speed operations by the OS, the more you have
for it, the better.

But, you say you are going to have a LOT of small indices. Even with one
shard per index, you probably will overload the cluster you have as a
single shard is not lightweight (unless you are going to have a large
cluster). See here on how to use routing to solve something like this:
https://groups.google.com/forum/#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
.

On Mon, Jan 23, 2012 at 10:44 PM, Grant grant@brewster.com wrote:

Right now between half and 75% the heap allocation is actually being
used depending on what's going on. But our data set is much smaller
than it will eventually be (although obviously we'll add nodes as
required to balance ram requirements with data size).

If all the disk caching is done outside the jvm, at my present
allocation I've only got 1gb of free RAM... if I need to keep my data
set that small per node to ensure most of it resides in the disk cache
I think I'll need to reduce the heap.

Our data set is going to be comprised of a LOT of very small (in the
neighborhood of a few hundred kb to ~50mb at the top end) indices.
We're running with one (i.e. no) shards and 3 replicas per index.

On Jan 23, 2:01 pm, Shay Banon kim...@gmail.com wrote:

Hard to say without more data. You can check the memory usage behavior
using something like bigdesk.

On Sun, Jan 22, 2012 at 11:54 PM, Grant gr...@brewster.com wrote:

Hi all,
I'm running my cluster in the cloud with local storage at present
(meaning local to the hosts. I'm also using the local gateway for
storage). At present our nodes have 4gb of ram, of which I'm
allocating 3gb to the jvm. Is there some consensus with ES on heap
sizing? Do I want to keep my jvm trimmed to take more advantage of
disk cache, or should I allocate all available ram to the heap?