What OS memory does es use other than Java?

Edward_Sargisson · May 6, 2014, 8:35pm

Hi all,
We have a problem where our es nodes will fail with an out of memory error
from Linux (note, not Java). Our es processes are configured with a fixed
amount of heap (60% of total RAM - just as in in the elasticsearch chef
cookbook).

So, something is consuming all of the memory available to Linux.

Is there any other memory that ES can use? Does it lock OS cache or buffer
memory so that it can't be released? If it opens lots of files does it use
up too much RAM? Is it doing off-heap allocation? (I'm pretty sure the
answer is no to the last).

We're struggling to find the exact memory resource being used up.

For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.

Thanks!
Edward

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 6, 2014, 10:23pm

Yes, of course Elasticsearch is using off-heap memory. All the Lucene index
I/O is using direct buffers in native OS memory.

Errors in allocating direct buffers will result in Java errors. You mention
Linux memory errors but unfortunately you do not quote it, so I have to
guess.

You should have enabled memory mapped files by index store mmapfs (default
on RHEL) so all files that are read by ES are mapped into virtual address
space of the OS VM management.

And also bootstrap.mlockall = true, so you also need to set memlock to
unlimited in /etc/security/limits.conf, because RHEL/Centos memlockable
memory is limited to 25% of RAM by default. In that case, Java should throw
an IOException "Map failed".

Note, because of the memory page lock support of the host OS, you should
also check what kind of virtualization you have enabled for the guest, it
should be HW (full) virtualization, not paravirtualization.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations, so you should disable the bootstrap.mlockall
setting.

As a side note, the recommended heap size is 50% of the RAM that is
available to the ES process. If you run a VM, you should assign at most 50%
of the configured guest OS memory to ES.

Jörg

On Tue, May 6, 2014 at 10:35 PM, Edward Sargisson ejsarge@gmail.com wrote:

Hi all,
We have a problem where our es nodes will fail with an out of memory error
from Linux (note, not Java). Our es processes are configured with a fixed
amount of heap (60% of total RAM - just as in in the elasticsearch chef
cookbook).

So, something is consuming all of the memory available to Linux.

Is there any other memory that ES can use? Does it lock OS cache or buffer
memory so that it can't be released? If it opens lots of files does it use
up too much RAM? Is it doing off-heap allocation? (I'm pretty sure the
answer is no to the last).

We're struggling to find the exact memory resource being used up.

For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.

Thanks!
Edward

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3Dw7iyBUfKxYQm45yG6Zh6a5Rg7SipDKNmdPA3MijYGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Edward_Sargisson · May 7, 2014, 11:09pm

Hi Jörg,
Thanks for your reply - that's given me a number of leads to follow up on.

Errors in allocating direct buffers will result in Java errors. You
mention Linux memory errors but unfortunately you do not quote it, so I
have to guess.
We see nothing useful in elasticsearch logs. What we do see is either the
console saying, "Out of memory: Kill process ... score 1 or sacrifice
child" or, once, we saw, "Loading dm-mirror.ko module, Waiting for required
block device discovery, Waiting for 2 sda-like device(s)...Kernel panic -
not syncing: Out of memory and no killable processes".
The first message I understand as the OOM-Killer coming out to whack a
process on the head. I don't understand the last one. I have screenshots of
these if required.

You should have enabled memory mapped files by index store mmapfs
(default on RHEL)
We haven't changed this setting so I expect it is the default. I looked for
a way to verify this but the es api appears not to return it.

bootstrap.mlockall = true...set memlock to unlimited
Yes - both done.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations
Is there a way to get evidence to show this? I reviewed the VMWare event
log and there was no ballooning in there (assuming we were looking at the
right spot).

If you run a VM, you should assign at most 50% of the configured guest
OS memory to ES.
We use the elasticsearch Puppet module but I modified it with a version of
the code in the elasticsearch Chef cookbook to automatically assign this -
where it appears to be assigning 60%. I was surprised by this too but I
copied it on the assumption that the cookbook writer knew what they were
doing. I've raised an issue to ask the
question: Why does this cookbook set the es max heap size to 60% of available memory? · Issue #209 · sous-chefs/elasticsearch · GitHub

For the curious: I've setup some monitoring to capture /proc/meminfo, the
count of the /proc//maps for elasticsearch and Flume as well as the
top few entries in top by memory usage. Now I'm just waiting for the next
failure.

Thanks for any help provided.

Cheers,
Edward

On Tuesday, May 6, 2014 3:23:10 PM UTC-7, Jörg Prante wrote:

Yes, of course Elasticsearch is using off-heap memory. All the Lucene
index I/O is using direct buffers in native OS memory.

Errors in allocating direct buffers will result in Java errors. You
mention Linux memory errors but unfortunately you do not quote it, so I
have to guess.

You should have enabled memory mapped files by index store mmapfs (default
on RHEL) so all files that are read by ES are mapped into virtual address
space of the OS VM management.

And also bootstrap.mlockall = true, so you also need to set memlock to
unlimited in /etc/security/limits.conf, because RHEL/Centos memlockable
memory is limited to 25% of RAM by default. In that case, Java should throw
an IOException "Map failed".

Note, because of the memory page lock support of the host OS, you should
also check what kind of virtualization you have enabled for the guest, it
should be HW (full) virtualization, not paravirtualization.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations, so you should disable the bootstrap.mlockall
setting.

As a side note, the recommended heap size is 50% of the RAM that is
available to the ES process. If you run a VM, you should assign at most 50%
of the configured guest OS memory to ES.

Jörg

On Tue, May 6, 2014 at 10:35 PM, Edward Sargisson <ejs...@gmail.com<javascript:>

wrote:

Hi all,
We have a problem where our es nodes will fail with an out of memory
error from Linux (note, not Java). Our es processes are configured with a
fixed amount of heap (60% of total RAM - just as in in the elasticsearch
chef cookbook).

So, something is consuming all of the memory available to Linux.

Is there any other memory that ES can use? Does it lock OS cache or
buffer memory so that it can't be released? If it opens lots of files does
it use up too much RAM? Is it doing off-heap allocation? (I'm pretty sure
the answer is no to the last).

We're struggling to find the exact memory resource being used up.

For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.

Thanks!
Edward

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5fe5ed6-7bfc-4ba2-ba81-cc56a4007a74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Edward_Sargisson · May 30, 2014, 12:16am

For those following along at home I thought I'd provide an update.

Jörg provided a big hint and all his advise was useful. Based on what he
said we discovered that the VM host was performing memory ballooning on the
guest. Briefly, this is a process where the host can reclaim memory from
the guest by inflating a memory balloon that grabs memory from the OS and
gives it back to the host. It transfers memory pressure the host might be
under to the guest. (Google it for more.)

We showed that our failures were happening when ballooning occurred. By
default, the balloon is designed to inflate to 60% (from memory) of
configured memory. Given that 50% of configured memory is mlocked these two
settings are incompatible.

Our fix was to configure VMWare to reserve the entire configured memory.
This means that the host doesn't try to take the memory back. It seemed
sensible to reserve all of the configured memory as we want elasticsearch
to keep its buffers and memory maps in place just as it would be on a
hardware instance in production. If placed under memory pressure, the OS
would start to reclaim these things.

After making the change, we've been running for a few weeks with no further
failures.

Cheers,
Edward

On Wednesday, May 7, 2014 4:09:16 PM UTC-7, Edward Sargisson wrote:

Hi Jörg,
Thanks for your reply - that's given me a number of leads to follow up on.

Errors in allocating direct buffers will result in Java errors. You
mention Linux memory errors but unfortunately you do not quote it, so I
have to guess.
We see nothing useful in elasticsearch logs. What we do see is either the
console saying, "Out of memory: Kill process ... score 1 or sacrifice
child" or, once, we saw, "Loading dm-mirror.ko module, Waiting for required
block device discovery, Waiting for 2 sda-like device(s)...Kernel panic -
not syncing: Out of memory and no killable processes".
The first message I understand as the OOM-Killer coming out to whack a
process on the head. I don't understand the last one. I have screenshots of
these if required.

You should have enabled memory mapped files by index store mmapfs
(default on RHEL)
We haven't changed this setting so I expect it is the default. I looked
for a way to verify this but the es api appears not to return it.

bootstrap.mlockall = true...set memlock to unlimited
Yes - both done.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations
Is there a way to get evidence to show this? I reviewed the VMWare event
log and there was no ballooning in there (assuming we were looking at the
right spot).

If you run a VM, you should assign at most 50% of the configured guest
OS memory to ES.
We use the elasticsearch Puppet module but I modified it with a version of
the code in the elasticsearch Chef cookbook to automatically assign this -
where it appears to be assigning 60%. I was surprised by this too but I
copied it on the assumption that the cookbook writer knew what they were
doing. I've raised an issue to ask the question:
Why does this cookbook set the es max heap size to 60% of available memory? · Issue #209 · sous-chefs/elasticsearch · GitHub

For the curious: I've setup some monitoring to capture /proc/meminfo, the
count of the /proc//maps for elasticsearch and Flume as well as the
top few entries in top by memory usage. Now I'm just waiting for the next
failure.

Thanks for any help provided.

Cheers,
Edward

On Tuesday, May 6, 2014 3:23:10 PM UTC-7, Jörg Prante wrote:

Yes, of course Elasticsearch is using off-heap memory. All the Lucene
index I/O is using direct buffers in native OS memory.

Errors in allocating direct buffers will result in Java errors. You
mention Linux memory errors but unfortunately you do not quote it, so I
have to guess.

You should have enabled memory mapped files by index store mmapfs
(default on RHEL) so all files that are read by ES are mapped into virtual
address space of the OS VM management.

And also bootstrap.mlockall = true, so you also need to set memlock to
unlimited in /etc/security/limits.conf, because RHEL/Centos memlockable
memory is limited to 25% of RAM by default. In that case, Java should throw
an IOException "Map failed".

Note, because of the memory page lock support of the host OS, you should
also check what kind of virtualization you have enabled for the guest, it
should be HW (full) virtualization, not paravirtualization.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations, so you should disable the bootstrap.mlockall
setting.

As a side note, the recommended heap size is 50% of the RAM that is
available to the ES process. If you run a VM, you should assign at most 50%
of the configured guest OS memory to ES.

Jörg

On Tue, May 6, 2014 at 10:35 PM, Edward Sargisson ejs...@gmail.com
wrote:

Hi all,
We have a problem where our es nodes will fail with an out of memory
error from Linux (note, not Java). Our es processes are configured with a
fixed amount of heap (60% of total RAM - just as in in the elasticsearch
chef cookbook).

So, something is consuming all of the memory available to Linux.

Is there any other memory that ES can use? Does it lock OS cache or
buffer memory so that it can't be released? If it opens lots of files does
it use up too much RAM? Is it doing off-heap allocation? (I'm pretty sure
the answer is no to the last).

We're struggling to find the exact memory resource being used up.

For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.

Thanks!
Edward

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec3537f5-0db3-40a0-9409-b83fecee2d1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · May 30, 2014, 1:48pm

On Thu, May 29, 2014 at 8:16 PM, Edward Sargisson ejsarge@gmail.com wrote:

Our fix was to configure VMWare to reserve the entire configured memory.
This means that the host doesn't try to take the memory back. It seemed
sensible to reserve all of the configured memory as we want elasticsearch
to keep its buffers and memory maps in place just as it would be on a
hardware instance in production. If placed under memory pressure, the OS
would start to reclaim these things.

After making the change, we've been running for a few weeks with no
further failures.

That's great to hear! Memory ballooning is mostly fine for application
servers or job servers but is the kiss of death to databases and database
like things. You may want to take an inventory of what has memory
ballooning on and make sure they can handle it....

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1fOyW3bkJ1w78OJ_XvJHNLJDzNXGz4RBGHeiDZSgQ3Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.