Kindly DE-mistify memory settings

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move the
index alone (without _source) to main memory if enough space is not present.

My experiments goes as follows.

I went tru this page -
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Experiment 1
I found that there is a memory store , so i tried starting with following
settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_size: 1mb
cache.memory.large_buffer_size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing
index.More worse , nothing happened to new index created after setting the
above configuration.

*Experiment 2
*I made the default gateway as none with the settings specified in
Experiment 1.

*Result *- Here main memory was fully used but then on restart , i lost the
enitre data. And its clear that when main memory runs out something bad is
going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment 1, on
creation of new node , main memory is heavily used and i am able to
maintain data between restarts.

*Experiment 4
*One node is having default settings and am opening a new node. here first
node is a low end machine and cant space any RAM. I was thinking of
applying search query on the new node and inserts on the old node. As
node-2 is highly using main memory , search queries should be executed
quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in
Experiment-3.

Result I started node-2 with the same cluster name as node-1 but instead
of taking the memory settings from its own config file , node-2 picked
store and cache settings from node-1.


Finally coming to the doubt sections

  • What does buffer and cache settings signify in the configuration
    mentioned in Experiment-1 ?
  • What is the difference between small and large cache or buffer
    settings ? Is it same as minimum and maximum allotted memory ?
  • What exactly happens when i enable this setting -
    *store.fs.memory.enabled
    . *Also this setting being important in not mentioned in
    elasticsearch.yml file
  • In a cluster is it possible for a node to have a different memory
    settings that the other node. If the machines in the cluster are not alike
    , it would be hard to fully use the capacity of high end machines then.
  • Is there any other memory settings that i missed out which can help me
    better use main memory and improve search query performance ?

Thanks

       Vineeth

--

Search queries go to all shards that make up an ES index (1), so the
statement "apply search queries on the new node" does not sound
technically correct.
On insert documents are distributed around an ES index (onto all shards)
based on a hash of their ID (2)

(1) Unless routing tells it there can't be any documents on another
shard, but that is only an optimization.
(2) Or a hash of a specific routing value, but the algorithm to decide
where to put something is the same.

HTH,
-Paul

On 8/22/2012 2:04 AM, Vineeth Mohan wrote:

I was thinking of applying search query on the new node and inserts
on the old node. As node-2 is highly using main memory , search
queries should be executed quite fast. This was my plan.

--

What are you doing exactly? The cache.memory… only applies when using the store type set to memory. stpre.fs.memory.enabled is not really used and has been removed for a long time (it proved to be not that useful).

On Aug 22, 2012, at 12:04 PM, Vineeth Mohan vineethmohan@algotree.com wrote:

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move the index alone (without _source) to main memory if enough space is not present.

My experiments goes as follows.

I went tru this page - Elasticsearch Platform — Find real-time answers at scale | Elastic

Experiment 1
I found that there is a memory store , so i tried starting with following settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_size: 1mb
cache.memory.large_buffer_size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing index.More worse , nothing happened to new index created after setting the above configuration.

Experiment 2
I made the default gateway as none with the settings specified in Experiment 1.

Result - Here main memory was fully used but then on restart , i lost the enitre data. And its clear that when main memory runs out something bad is going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment 1, on creation of new node , main memory is heavily used and i am able to maintain data between restarts.

Experiment 4
One node is having default settings and am opening a new node. here first node is a low end machine and cant space any RAM. I was thinking of applying search query on the new node and inserts on the old node. As node-2 is highly using main memory , search queries should be executed quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in Experiment-3.

Result I started node-2 with the same cluster name as node-1 but instead of taking the memory settings from its own config file , node-2 picked store and cache settings from node-1.


Finally coming to the doubt sections

What does buffer and cache settings signify in the configuration mentioned in Experiment-1 ?
What is the difference between small and large cache or buffer settings ? Is it same as minimum and maximum allotted memory ?
What exactly happens when i enable this setting - store.fs.memory.enabled . Also this setting being important in not mentioned in elasticsearch.yml file
In a cluster is it possible for a node to have a different memory settings that the other node. If the machines in the cluster are not alike , it would be hard to fully use the capacity of high end machines then.
Is there any other memory settings that i missed out which can help me better use main memory and improve search query performance ?

Thanks

       Vineeth

--

--

Hello Shay ,

I am trying to improve the query response time in elasticsearch side.
I have set the store type to memory but then it was found that on restart
the amont of memory consumed was far less that the size of index.
How can i achieve storage of the whole index and data to main memory ?

Thanks
Vineeth

On Thu, Aug 30, 2012 at 3:01 AM, Shay Banon kimchy@gmail.com wrote:

What are you doing exactly? The cache.memory… only applies when using the
store type set to memory. stpre.fs.memory.enabled is not really used and
has been removed for a long time (it proved to be not that useful).

On Aug 22, 2012, at 12:04 PM, Vineeth Mohan vineethmohan@algotree.com
wrote:

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move the
index alone (without _source) to main memory if enough space is not present.

My experiments goes as follows.

I went tru this page -
Elasticsearch Platform — Find real-time answers at scale | Elastic

Experiment 1
I found that there is a memory store , so i tried starting with following
settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_size: 1mb
cache.memory.large_buffer_size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing
index.More worse , nothing happened to new index created after setting the
above configuration.

*Experiment 2
*I made the default gateway as none with the settings specified in
Experiment 1.

*Result *- Here main memory was fully used but then on restart , i lost
the enitre data. And its clear that when main memory runs out something bad
is going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment 1,
on creation of new node , main memory is heavily used and i am able to
maintain data between restarts.

*Experiment 4
*One node is having default settings and am opening a new node. here
first node is a low end machine and cant space any RAM. I was thinking of
applying search query on the new node and inserts on the old node. As
node-2 is highly using main memory , search queries should be executed
quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in
Experiment-3.

Result I started node-2 with the same cluster name as node-1 but
instead of taking the memory settings from its own config file , node-2
picked store and cache settings from node-1.


Finally coming to the doubt sections

  • What does buffer and cache settings signify in the configuration
    mentioned in Experiment-1 ?
  • What is the difference between small and large cache or buffer
    settings ? Is it same as minimum and maximum allotted memory ?
  • What exactly happens when i enable this setting - *store.fs.memory.enabled
    . *Also this setting being important in not mentioned in
    elasticsearch.yml file
  • In a cluster is it possible for a node to have a different memory
    settings that the other node. If the machines in the cluster are not alike
    , it would be hard to fully use the capacity of high end machines then.
  • Is there any other memory settings that i missed out which can help
    me better use main memory and improve search query performance ?

Thanks

       Vineeth

--

--

--

Hi Vineeth,

there are several aspects in your question. There is no "binary switch" for
moving an index from disk into RAM or vice versa.

In short terms, you can

  1. let the OS decide if Lucene files should be loaded into RAM via the file
    system cache (store type "niofs")
  2. enforce OS to load Lucene files into RAM for read (store type "mmapfs",
    optional mlock_all)
  3. enforce JVM to organize a heap-based index only (store type "memory", GC
    gets more work, OS can still swap to disk though, also note the Lucene
    "RAMDirectory" implementation is only for quick tests and not for
    production)
  4. set a large heap size and enable ES caches with soft references (several
    cache types, field cache, filter cache, facet cache etc.)
  5. set a large heap size and use ES warming API to fill the caches

I assume the characteristics of average search response time on different
store types do not vary much over time because of the different system
components of the machine that try to optimize resource usage. What varies
is the scalability, that is, are OS and JVM able to handle the load
efficiently even when the index and query volume increases. There are
tradeoffs and edge cases you need to take extra care of. Loading everything
into RAM may suddenly take a lot of time and may look like the system
stalls but subsequent searches are very fast.

The gateway mechanism can persist the index (kind of a secondary storage)
and is separated from the search index.

Personally, I let the OS decide what to keep in RAM.

Best regards,

Jörg

On Friday, August 31, 2012 7:38:41 AM UTC+2, Vineeth Mohan wrote:

Hello Shay ,

I am trying to improve the query response time in elasticsearch side.
I have set the store type to memory but then it was found that on restart
the amont of memory consumed was far less that the size of index.
How can i achieve storage of the whole index and data to main memory ?

Thanks
Vineeth

On Thu, Aug 30, 2012 at 3:01 AM, Shay Banon <kim...@gmail.com<javascript:>

wrote:

What are you doing exactly? The cache.memory… only applies when using the
store type set to memory. stpre.fs.memory.enabled is not really used and
has been removed for a long time (it proved to be not that useful).

On Aug 22, 2012, at 12:04 PM, Vineeth Mohan <vineet...@algotree.com<javascript:>>
wrote:

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move the
index alone (without _source) to main memory if enough space is not present.

My experiments goes as follows.

I went tru this page -
Elasticsearch Platform — Find real-time answers at scale | Elastic

Experiment 1
I found that there is a memory store , so i tried starting with following
settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_size: 1mb
cache.memory.large_buffer_size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing
index.More worse , nothing happened to new index created after setting the
above configuration.

*Experiment 2
*I made the default gateway as none with the settings specified in
Experiment 1.

*Result *- Here main memory was fully used but then on restart , i lost
the enitre data. And its clear that when main memory runs out something bad
is going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment 1,
on creation of new node , main memory is heavily used and i am able to
maintain data between restarts.

*Experiment 4
*One node is having default settings and am opening a new node. here
first node is a low end machine and cant space any RAM. I was thinking of
applying search query on the new node and inserts on the old node. As
node-2 is highly using main memory , search queries should be executed
quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in
Experiment-3.

Result I started node-2 with the same cluster name as node-1 but
instead of taking the memory settings from its own config file , node-2
picked store and cache settings from node-1.


Finally coming to the doubt sections

  • What does buffer and cache settings signify in the configuration
    mentioned in Experiment-1 ?
  • What is the difference between small and large cache or buffer
    settings ? Is it same as minimum and maximum allotted memory ?
  • What exactly happens when i enable this setting - *store.fs.memory.enabled
    . *Also this setting being important in not mentioned in
    elasticsearch.yml file
  • In a cluster is it possible for a node to have a different memory
    settings that the other node. If the machines in the cluster are not alike
    , it would be hard to fully use the capacity of high end machines then.
  • Is there any other memory settings that i missed out which can help
    me better use main memory and improve search query performance ?

Thanks

       Vineeth

--

--

--

Hello Jörg,

Thanks for you reply.

But then i have a few doubts.
In this page -

its written

*"The best one for the operating environment will be automatically chosen:
mmapfs on Solaris/Windows 64bit, simplefs on Windows 32bit, and niofs for
the rest." *

I was under the impression that only simple fs is available for linux.
Let me try the options you have provided and get back to you.

Thanks
Vineeth

On Fri, Aug 31, 2012 at 2:15 PM, Jörg Prante joergprante@gmail.com wrote:

Hi Vineeth,

there are several aspects in your question. There is no "binary switch"
for moving an index from disk into RAM or vice versa.

In short terms, you can

  1. let the OS decide if Lucene files should be loaded into RAM via the
    file system cache (store type "niofs")
  2. enforce OS to load Lucene files into RAM for read (store type "mmapfs",
    optional mlock_all)
  3. enforce JVM to organize a heap-based index only (store type "memory",
    GC gets more work, OS can still swap to disk though, also note the Lucene
    "RAMDirectory" implementation is only for quick tests and not for
    production)
  4. set a large heap size and enable ES caches with soft references
    (several cache types, field cache, filter cache, facet cache etc.)
  5. set a large heap size and use ES warming API to fill the caches

I assume the characteristics of average search response time on different
store types do not vary much over time because of the different system
components of the machine that try to optimize resource usage. What varies
is the scalability, that is, are OS and JVM able to handle the load
efficiently even when the index and query volume increases. There are
tradeoffs and edge cases you need to take extra care of. Loading everything
into RAM may suddenly take a lot of time and may look like the system
stalls but subsequent searches are very fast.

The gateway mechanism can persist the index (kind of a secondary storage)
and is separated from the search index.

Personally, I let the OS decide what to keep in RAM.

Best regards,

Jörg

On Friday, August 31, 2012 7:38:41 AM UTC+2, Vineeth Mohan wrote:

Hello Shay ,

I am trying to improve the query response time in elasticsearch side.
I have set the store type to memory but then it was found that on restart
the amont of memory consumed was far less that the size of index.
How can i achieve storage of the whole index and data to main memory ?

Thanks
Vineeth

On Thu, Aug 30, 2012 at 3:01 AM, Shay Banon kim...@gmail.com wrote:

What are you doing exactly? The cache.memory… only applies when using
the store type set to memory. stpre.fs.memory.enabled is not really used
and has been removed for a long time (it proved to be not that useful).

On Aug 22, 2012, at 12:04 PM, Vineeth Mohan vineet...@algotree.com
wrote:

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move
the index alone (without _source) to main memory if enough space is not
present.

My experiments goes as follows.

I went tru this page - http://www.elasticsearch.org/**
guide/reference/index-modules/**store.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/store.html

Experiment 1
I found that there is a memory store , so i tried starting with
following settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_**size: 1mb
cache.memory.large_buffer_**size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing
index.More worse , nothing happened to new index created after setting the
above configuration.

*Experiment 2
*I made the default gateway as none with the settings specified in
Experiment 1.

*Result *- Here main memory was fully used but then on restart , i lost
the enitre data. And its clear that when main memory runs out something bad
is going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment 1,
on creation of new node , main memory is heavily used and i am able to
maintain data between restarts.

*Experiment 4
*One node is having default settings and am opening a new node. here
first node is a low end machine and cant space any RAM. I was thinking of
applying search query on the new node and inserts on the old node. As
node-2 is highly using main memory , search queries should be executed
quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in
Experiment-3.

Result I started node-2 with the same cluster name as node-1 but
instead of taking the memory settings from its own config file , node-2
picked store and cache settings from node-1.

------------------------------------------------------------
------------------------------**---------

Finally coming to the doubt sections

  • What does buffer and cache settings signify in the configuration
    mentioned in Experiment-1 ?
  • What is the difference between small and large cache or buffer
    settings ? Is it same as minimum and maximum allotted memory ?
  • What exactly happens when i enable this setting - *store.fs.memory.enabled
    . *Also this setting being important in not mentioned in
    elasticsearch.yml file
  • In a cluster is it possible for a node to have a different memory
    settings that the other node. If the machines in the cluster are not alike
    , it would be hard to fully use the capacity of high end machines then.
  • Is there any other memory settings that i missed out which can
    help me better use main memory and improve search query performance ?

Thanks

       Vineeth

--

--

--

--

Hi Vineeth,

no, simplefs is not the default.

niofs is for all 32bit OS and all non Linux/Solaris/Win OS that do not
support mmapfs ("unmap" call, e.g. Mac OS X, FreeBSD).

mmapfs should be chosen for all 64bit OS because of the large virtual
address space. Linux is not a problem here for Lucene, see also
https://issues.apache.org/jira/browse/LUCENE-3198

The reason simplefs was chosen on Win32 is because of a known severe bug on
older JVM on that platform that dramatically effect NIO performance.

Maybe you are also interested in

http://jprante.github.com/applications/2012/07/26/Mmap-with-Lucene.html

I would love to read numbers, so if you can publish benchmark results, it
would be very welcome.

Best regards,

Jörg

On Sunday, September 2, 2012 6:15:41 AM UTC+2, Vineeth Mohan wrote:

Hello Jörg,

Thanks for you reply.

But then i have a few doubts.
In this page -
Elasticsearch Platform — Find real-time answers at scale | Elastic
its written

*"The best one for the operating environment will be automatically
chosen: mmapfs on Solaris/Windows 64bit, simplefs on Windows 32bit, and
niofs for the rest." *

I was under the impression that only simple fs is available for linux.
Let me try the options you have provided and get back to you.

Thanks
Vineeth

On Fri, Aug 31, 2012 at 2:15 PM, Jörg Prante <joerg...@gmail.com<javascript:>

wrote:

Hi Vineeth,

there are several aspects in your question. There is no "binary switch"
for moving an index from disk into RAM or vice versa.

In short terms, you can

  1. let the OS decide if Lucene files should be loaded into RAM via the
    file system cache (store type "niofs")
  2. enforce OS to load Lucene files into RAM for read (store type
    "mmapfs", optional mlock_all)
  3. enforce JVM to organize a heap-based index only (store type "memory",
    GC gets more work, OS can still swap to disk though, also note the Lucene
    "RAMDirectory" implementation is only for quick tests and not for
    production)
  4. set a large heap size and enable ES caches with soft references
    (several cache types, field cache, filter cache, facet cache etc.)
  5. set a large heap size and use ES warming API to fill the caches

I assume the characteristics of average search response time on different
store types do not vary much over time because of the different system
components of the machine that try to optimize resource usage. What varies
is the scalability, that is, are OS and JVM able to handle the load
efficiently even when the index and query volume increases. There are
tradeoffs and edge cases you need to take extra care of. Loading everything
into RAM may suddenly take a lot of time and may look like the system
stalls but subsequent searches are very fast.

The gateway mechanism can persist the index (kind of a secondary storage)
and is separated from the search index.

Personally, I let the OS decide what to keep in RAM.

Best regards,

Jörg

On Friday, August 31, 2012 7:38:41 AM UTC+2, Vineeth Mohan wrote:

Hello Shay ,

I am trying to improve the query response time in elasticsearch side.
I have set the store type to memory but then it was found that on
restart the amont of memory consumed was far less that the size of index.
How can i achieve storage of the whole index and data to main memory ?

Thanks
Vineeth

On Thu, Aug 30, 2012 at 3:01 AM, Shay Banon kim...@gmail.com wrote:

What are you doing exactly? The cache.memory… only applies when using
the store type set to memory. stpre.fs.memory.enabled is not really used
and has been removed for a long time (it proved to be not that useful).

On Aug 22, 2012, at 12:04 PM, Vineeth Mohan vineet...@algotree.com
wrote:

Hi All,

Its been sometime i have been playing with memory settings.
My ultimate aim is simple - Use as much of main memory. At least move
the index alone (without _source) to main memory if enough space is not
present.

My experiments goes as follows.

I went tru this page - http://www.elasticsearch.org/**
guide/reference/index-modules/**store.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/store.html

Experiment 1
I found that there is a memory store , so i tried starting with
following settings in elasticsearch.yml

store.type : memory
cache.memory.small_buffer_**size: 1mb
cache.memory.large_buffer_**size: 10mb
cache.memory.small_cache_size: 1000mb
cache.memory.large_cache_size: 2000mb

Result - With the above setting nothing happened to an existing
index.More worse , nothing happened to new index created after setting the
above configuration.

*Experiment 2
*I made the default gateway as none with the settings specified in
Experiment 1.

*Result *- Here main memory was fully used but then on restart , i
lost the enitre data. And its clear that when main memory runs out
something bad is going to happen.

Experiment 3
Then i found a silver bullet , it was this setting -

store.fs.memory.enabled : true

Result With this settings and the settings mentioned in Experiment
1, on creation of new node , main memory is heavily used and i am able to
maintain data between restarts.

*Experiment 4
*One node is having default settings and am opening a new node. here
first node is a low end machine and cant space any RAM. I was thinking of
applying search query on the new node and inserts on the old node. As
node-2 is highly using main memory , search queries should be executed
quite fast. This was my plan.

So node-1 have default settings and node-2 have settings specified in
Experiment-3.

Result I started node-2 with the same cluster name as node-1 but
instead of taking the memory settings from its own config file , node-2
picked store and cache settings from node-1.

------------------------------------------------------------
------------------------------**---------

Finally coming to the doubt sections

  • What does buffer and cache settings signify in the configuration
    mentioned in Experiment-1 ?
  • What is the difference between small and large cache or buffer
    settings ? Is it same as minimum and maximum allotted memory ?
  • What exactly happens when i enable this setting - *store.fs.memory.enabled
    . *Also this setting being important in not mentioned in
    elasticsearch.yml file
  • In a cluster is it possible for a node to have a different memory
    settings that the other node. If the machines in the cluster are not alike
    , it would be hard to fully use the capacity of high end machines then.
  • Is there any other memory settings that i missed out which can
    help me better use main memory and improve search query performance ?

Thanks

       Vineeth

--

--

--

--