ELS memory consumption

How much memory should be available for the ELS process? I am running
ELS on a W2k8 server, the index has around 100M documents and the size
of the index is just under 50GB. It looks like the heap size of 2GB is
sufficient but mapped files take another 2.5 GB, so overall memory
allocated for the process is closer to 5GB.

My question is how can I estimate the amount of memory needed for
mapped files based on the size of the indexes? Also is there a way or
a need to control it?

On Wed, Nov 2, 2011 at 6:15 PM, Michael Feingold mfeingold@hill30.comwrote:

How much memory should be available for the ELS process? I am running
ELS on a W2k8 server, the index has around 100M documents and the size
of the index is just under 50GB. It looks like the heap size of 2GB is
sufficient but mapped files take another 2.5 GB, so overall memory
allocated for the process is closer to 5GB.

My question is how can I estimate the amount of memory needed for
mapped files based on the size of the indexes? Also is there a way or
a need to control it?

On windows, Lucene (and elasticsearch) will default to Mapped files for
better performance. You can disable that and use simplefs index store type
if you want. The mapped files will take the same size as the actual index
files end up taking. Regarding actual heap, its hard to answer. Lucene
internally will load data into memory to be able to search faster
(basically intervals of terms), and there is the field "cache", which is
basically used for sorting (on something other than score), and faceting
(this is exposed using the node stats and index stats API).

Hmm... Are you saying that it will try to load the entire 50GB into Mapped
file(s). During the index load I got away with the box with 8GB RAM and no
swap file. I've seen the mapped file allocate to the process to go all the
way up to 5GB and then down to 2GB (on top of 1.5GB heap) without any
visible impact on the document load speed - around 11 min/ 1M documents.
The chart of memory allocated to the mapped files looks like a saw - as if
there is some sort of GC going on there.
Does it mean that adding more RAM can increase performance? I am not too
concerned about load performance - my data are pretty static, but search
performance is important.

On Wed, Nov 2, 2011 at 2:04 PM, Shay Banon kimchy@gmail.com wrote:

On Wed, Nov 2, 2011 at 6:15 PM, Michael Feingold mfeingold@hill30.comwrote:

How much memory should be available for the ELS process? I am running
ELS on a W2k8 server, the index has around 100M documents and the size
of the index is just under 50GB. It looks like the heap size of 2GB is
sufficient but mapped files take another 2.5 GB, so overall memory
allocated for the process is closer to 5GB.

My question is how can I estimate the amount of memory needed for
mapped files based on the size of the indexes? Also is there a way or
a need to control it?

On windows, Lucene (and elasticsearch) will default to Mapped files for
better performance. You can disable that and use simplefs index store type
if you want. The mapped files will take the same size as the actual index
files end up taking. Regarding actual heap, its hard to answer. Lucene
internally will load data into memory to be able to search faster
(basically intervals of terms), and there is the field "cache", which is
basically used for sorting (on something other than score), and faceting
(this is exposed using the node stats and index stats API).