If have 100s of gigs of memory on each physical machine, is it good to
create one elasticsearch node with as much memory as we can (or) create
several instances with smaller heap sizes on different ports? Any Pros and
Cons of each approach?
The suggested scenario for elasticsearch is for the JVM heap to be roughly
half the size of RAM (to be used by the field/filter caches) so that the
remaining RAM can be used by the OS to cache the Lucene indices. That said,
Java does not compress pointers after 32GB, so using half the RAM (50GB+)
would not be efficient.
In your case, it is better to run multiple elasticsearch instances with
heap sizes less than 32GB. I have never done so, but you probably would
need to play with the direct size so that each instance can use the OS
cache effectively. Can anyone else comment on this last part? Should each
node set ES_DIRECT_SIZE, or let the OS manage both instances cache? Since
the work load should be identical, the OS might do a good job.
If have 100s of gigs of memory on each physical machine, is it good to
create one elasticsearch node with as much memory as we can (or) create
several instances with smaller heap sizes on different ports? Any Pros and
Cons of each approach?
We're looking at either virtualising all the nodes or using containers ( docker.io) to maximise efficiency.
We'd be interested in anyone else's experience in either.
On 14 November 2013 06:32, Ivan Brusic ivan@brusic.com wrote:
The suggested scenario for elasticsearch is for the JVM heap to be roughly
half the size of RAM (to be used by the field/filter caches) so that the
remaining RAM can be used by the OS to cache the Lucene indices. That said,
Java does not compress pointers after 32GB, so using half the RAM (50GB+)
would not be efficient.
In your case, it is better to run multiple elasticsearch instances with
heap sizes less than 32GB. I have never done so, but you probably would
need to play with the direct size so that each instance can use the OS
cache effectively. Can anyone else comment on this last part? Should each
node set ES_DIRECT_SIZE, or let the OS manage both instances cache? Since
the work load should be identical, the OS might do a good job.
If have 100s of gigs of memory on each physical machine, is it good to
create one elasticsearch node with as much memory as we can (or) create
several instances with smaller heap sizes on different ports? Any Pros and
Cons of each approach?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.