Memory usage of the machine with ES is continuously increasing


(Pradeep Reddy) #1

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node to
let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ccc7887-59f8-4267-ac05-450f00c42045%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #2

ES version was actually 1.5.0, I have upgraded to 1.5.2, so restarting the
ES cleared up the dentry cache.
I believe dentry cache is something that is handled by linux, but it seems
like ES/lucene has a role to play how dentry cache is handled. If that is
the case ES/lucene should be able to control how much dentry cache is there.

Dentry cache is continuously increasing, is this unavoidable considering
that the data is increasing every day (though not significant) ? I have an
ELK stack where there are many millions of documents, though there are less
search requests to the cluster, which doesn't have this problem.

On Monday, May 4, 2015 at 4:17:40 PM UTC+5:30, Pradeep Reddy wrote:

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node
to let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/551a454e-395f-45e9-a4bc-afedc3e564b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #3

When the underlying lucene engine interacts with a segment the OS will
leverage free system RAM and keep that segment in memory. However
Elasticsearch/lucene has no way to control of OS level caches.

What exactly is the problem here? This caching is what helps provide
performance for ES.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_HtS5q6jW9hAO_TejPYJ0VCQM7f5TXRtKQq7tjMDEtbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #4

Hi Mark,

Thanks.

I understand that caching makes ES perform better, and it's normal. What I
don't understand is the unusual size of dentry objects (dentry size
increase at about 200+ mb per day?) for the data size I have. There isn't
this behaviour on the ELK ES where I have many times of data compared to
this.

Does that mean there are unusual no of segments being created?, is there
something that needs to be optimized?

The only thing that is different is that we take hourly snapshots to S3
directly, is it possible that the S3 paths are also part of dentry objects?
is it possible that the no of snapshots has some thing to do with? (I know
that having too many no of snapshots will make snapshotting slower). Note
that when I restart the ES it gets cleared(most of it, may be OS clears up
this cache once it sees that the parent process has been stopped).

On Monday, May 4, 2015 at 4:17:40 PM UTC+5:30, Pradeep Reddy wrote:

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node
to let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d460ca2-bd9a-45d6-a421-5b4b35d812aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

On my systems, dentry use is ~18MB while ES 1.5.2 is under heavy duty (RHEL
6.6, Java 8u45, on-premise server).

I think you should double check if the effect you see is caused by ES or by
your JVM/Arch Linux/EC2/whatever.

Jörg

On Mon, May 4, 2015 at 12:47 PM, Pradeep Reddy <
pradeepreddy.manu.iitkgp@gmail.com> wrote:

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node
to let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ccc7887-59f8-4267-ac05-450f00c42045%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ccc7887-59f8-4267-ac05-450f00c42045%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH9z%2BFiRifw84nbjj2-nr2ixvSW3Xv48oaB4v8%2Bm8Csbg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #6

Thanks Jörg,

Yes it is unusual to have such dentry cache, there is definitely something
fishy going on. Stopping ES clears it up, so it is something related ES I
believe.

On Thu, May 7, 2015 at 8:16 PM, joergprante@gmail.com <joergprante@gmail.com

wrote:

On my systems, dentry use is ~18MB while ES 1.5.2 is under heavy duty
(RHEL 6.6, Java 8u45, on-premise server).

I think you should double check if the effect you see is caused by ES or
by your JVM/Arch Linux/EC2/whatever.

Jörg

On Mon, May 4, 2015 at 12:47 PM, Pradeep Reddy <
pradeepreddy.manu.iitkgp@gmail.com> wrote:

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node
to let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ccc7887-59f8-4267-ac05-450f00c42045%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ccc7887-59f8-4267-ac05-450f00c42045%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/c8_BLOtFVhs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH9z%2BFiRifw84nbjj2-nr2ixvSW3Xv48oaB4v8%2Bm8Csbg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH9z%2BFiRifw84nbjj2-nr2ixvSW3Xv48oaB4v8%2Bm8Csbg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADX9mKM9fY6CZg8u%3DNNUFHwABZyvdZ%2Bhn40pLG_Y9gRmeOyp%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #7

So the bloating of dentry cache is because of this
https://bugzilla.redhat.com/show_bug.cgi?id=1044666
My NSS version is 3.18 (Arch Linux, Kernel version. 3.14.21)

Setting NSS_SDB_USE_CACHE=YES has stopped the bloating. I have set this on
one of the three nodes, dentry size hasn't changed a bit (in fact there was
a small decrease) where as other two nodes have an increase of around 200
MB (in 18 hours).

At this point I am not sure which component of th ES is making these curl
requests (may be cloud-aws plugin?)

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/726ad022-bdc0-45c0-846e-bc8486886836%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #8

So the bloating of dentry cache is because of this
https://bugzilla.redhat.com/show_bug.cgi?id=1044666
My NSS version is 3.18 (Arch Linux, Kernel version. 3.14.21)

Setting NSS_SDB_USE_CACHE=YES has stopped the bloating. I have set this on
one of the three nodes, dentry size hasn't changed a bit (in fact there was
a small decrease) where as other two nodes have an increase of around 200
MB (in 18 hours).

At this point I am not sure which component of th ES is making these curl
requests (may be cloud-aws plugin?)

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59e1655f-fb1c-4945-8d63-1f01af2a29d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #9

Actually, the problem has appeared again, memory consumption was stable for
couple of days, then it started increasing, env variable was only set for
that particular session or something, I had to set it again by adding it to
/etc/environment , but this doesn't have any affect anymore.. there may be
some other parameter that's affecting the dentry cache.

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/40a218f6-74b2-4671-be2b-5c3a2f8df90a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Pradeep Reddy) #10

I have straced elasticsearch for a couple of minutes
strace -fp PID -o file.txt

out of the 40k+ events recorded
2.2k + events have resulted in errors like this

I think this is the reason for the dentry bloating, though I am not sure if
there is some thing wrong with my cluster or not.

On Monday, May 4, 2015 at 4:17:40 PM UTC+5:30, Pradeep Reddy wrote:

ES version 1.5.2
Arch Linux on Amazon EC2
of the available 16 GB, 8 GB is heap (mlocked). Memory consumption is
continuously increasing (225 MB per day).
Total no of documents is around 800k, 500 MB.

cat /proc/meminfo has

Slab: 3424728 kB

SReclaimable: 3407256 kB

curl -XGET 'http://localhost:9200/_nodes/stats/jvm?pretty'

"heap_used_in_bytes" : 5788779888,
"heap_used_percent" : 67,
"heap_committed_in_bytes" : 8555069440,

slabtop
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
17750313 17750313 100% 0.19K 845253 21 3381012K dentry

So the continuous increase in memory usage is because of the slab usage I
think, If I restart ES, then slab memory is freed. I see that ES still has
some free heap available, but from elastic documentation

Lucene is designed to leverage the underlying OS for caching in-memory
data structures. Lucene segments are stored in individual files. Because
segments are immutable, these files never change. This makes them very
cache friendly, and the underlying OS will happily keep hot segments
resident in memory for faster access.

My question is, should I add more nodes or increase the ram of each node
to let lucene use as much memory as it wants ? how significant performance
difference will be there if I choose to upgrade ES machines to have more
RAM.

Or, can I make some optimizations that decreases the slab usage or clean
slab memory partially?

--
Please update your bookmarks! We have moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b811d68d-7f7a-4763-90fc-f5b99bb00eca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Zaar Hai) #11

vangap,

Did you manage to figure this out? We are hitting the similar issue on all our single-node (standalone) installations. dentry cache is bloating memory and causing stop-the-world effect when kernel decides to clean it up.

Restarting the ES process releases the cache.

ElasticSearch 1.4.2, Oracle JRE1.8.0_31, Centos 6.4 64-bit.


(Zaar Hai) #12

This appears to be CentOS specific issue. I've run the same setup on Ubuntu and dentry cache does not grow.

BTW, restarting does not release the dentry cache - removing ES data dir does. My ES data is volatile and recreated with each ES restart - that's why ES restart itself made me info believing that it frees the cache.


(Pradeep Reddy) #13

As per elastic team its not an issue https://github.com/elastic/elasticsearch/issues/11253. We are on ArchLinux, we are just lettting it grow, we don't see any OOM errors. After a point, it stops growing as kernel clears it up. Restarting ES does clear it for us, but that's useless and not advised.


(Zaar Hai) #14

Our nodes have a lots of RAM (for reasons not related to ES). So when kernel decides to free dentry cache containing hundreds of millions entries, it causes stop-the-world pauses for more than a minute. That's how we bumped into the issue.

Nevertheless I think it's pretty well clear now and it's good to have it documented here.


(Michael Owings) #15

I know I'm a little late here, but I have been looking at the same issue here. However, it really isn't an issue. Basically the dentry cache is available for use if it's needed; in fact, anything in SReclaimable will be free for reuse if needed. In that sense, this memory is a lot like disk cache; you shouldn't count it against used memory.

The only problem, is that this memory is not reported by the free utility (at least as of ubuntu 14.04). This means that if you are running memory checks/alerts that use free to get the data in use, you are going to see a lot of false alarms. For instance, on our 16G hosts, we can end up with 6G of memory as SReclaimable; but this 6G doesn't show up at all in free.

Note that you can free this memory with the following command (as root):

sync;echo 3 > /proc/sys/vm/drop_caches

That will free up page, inode and dentry cache. But there's no real need to do this, and it probably has a short term negative effect on performance. Better to just let the kernel release that memory as needed, and just fix any alerting that relies on free


(Abinay) #16

@Michael_Owings My client said that he will be having about 500MB of log file generated each day . So in order to test whether my ELK stack can handle this I mimiced this requirement as follows - I ran a bash script in infinite loop which was printing to a file. The content of the file were - "{timestamp} local.INFO:{timestamp}" . Whole ELK stack is installed on same machine . Now what I am seeing is that my RAM is increasing continously . Currently doing htop(its ubuntu 14.04 machine) i get the following memory status usage - 2759/3764MB . So what is the reason for this increase in memory ??? Can you elaborate it a bit more clearly ?? Please also specify the remedy for this .


(system) #17