Hey hello everybody,
for the past weeks We have been experiencing some memory problems with
elasticsearch, and after running some tests We narrowed it down to faceting
on multi valued string fields. We don't see OOM, but We see how overtime
memory start to build up and after a few days it's just not able to free up
any memory(leading to really big pauses of GC). Attached goes a picture of
heap size for the 2 clusters. They are exactly the same(same hardware,
settings, data,queries and etc), with the only difference that for the
first one we removed faceting on the multi valued string fields. The test
has been running since friday the 15h until this morning, when the cluster
already start to struggle for memory.
We have field data/filter cache limited(they amount to less than 5GB
summed up). Every node runs on a 40GB JVM, on servers with 64GB and 24
cores. Each index has 4 shards, and the shard size on disk is about 8GB.
Any suggestions on where to look for? Thanks
There is a known issue regarding facets on multi-valued fields:
Do your documents vary in the amount of terms for the faceted field? People
that had high variance tended to have the most issues.
Facets have been re-architected in master (0.21). Shay posted something
about the release last week. You can try running master to see if it helps.
--
Ivan
On Tue, Feb 19, 2013 at 1:01 AM, Leonardo Menezes mail@lmenezes.com wrote:
Hey hello everybody,
for the past weeks We have been experiencing some memory problems with
elasticsearch, and after running some tests We narrowed it down to faceting
on multi valued string fields. We don't see OOM, but We see how overtime
memory start to build up and after a few days it's just not able to free up
any memory(leading to really big pauses of GC). Attached goes a picture of
heap size for the 2 clusters. They are exactly the same(same hardware,
settings, data,queries and etc), with the only difference that for the
first one we removed faceting on the multi valued string fields. The test
has been running since friday the 15h until this morning, when the cluster
already start to struggle for memory.
We have field data/filter cache limited(they amount to less than 5GB
summed up). Every node runs on a 40GB JVM, on servers with 64GB and 24
cores. Each index has 4 shards, and the shard size on disk is about 8GB.
Any suggestions on where to look for? Thanks
Hey Ivan,
thanks for your reply. I have read that before, and even though I also
experience some high memory usage, that isn't really a problem for me(even
though it would be nicer having less memory requirements). I'm more worried
now about the fact that the memory seems to be building up, and that for
each iteration of the CMS GC, less memory is actually freed up, until it
runs really tight on memory and eventually collapses due to really long GC
pauses. Thanks anyway
Do your documents vary in the amount of terms for the faceted field?
People that had high variance tended to have the most issues.
Facets have been re-architected in master (0.21). Shay posted something
about the release last week. You can try running master to see if it helps.
--
Ivan
On Tue, Feb 19, 2013 at 1:01 AM, Leonardo Menezes mail@lmenezes.comwrote:
Hey hello everybody,
for the past weeks We have been experiencing some memory problems
with elasticsearch, and after running some tests We narrowed it down to
faceting on multi valued string fields. We don't see OOM, but We see how
overtime memory start to build up and after a few days it's just not able
to free up any memory(leading to really big pauses of GC). Attached goes a
picture of heap size for the 2 clusters. They are exactly the same(same
hardware, settings, data,queries and etc), with the only difference that
for the first one we removed faceting on the multi valued string fields.
The test has been running since friday the 15h until this morning, when the
cluster already start to struggle for memory.
We have field data/filter cache limited(they amount to less than 5GB
summed up). Every node runs on a 40GB JVM, on servers with 64GB and 24
cores. Each index has 4 shards, and the shard size on disk is about 8GB.
Any suggestions on where to look for? Thanks
thanks for your reply. I have read that before, and even though I
also experience some high memory usage, that isn't really a problem
for me(even though it would be nicer having less memory requirements).
I'm more worried now about the fact that the memory seems to be
building up, and that for each iteration of the CMS GC, less memory is
actually freed up, until it runs really tight on memory and eventually
collapses due to really long GC pauses. Thanks anyway
Note that the field data "cache" isn't really a cache. It doesn't get
freed because, chances are, you're just going to need that data again
the next time you run the query anyway.
For this reason, in the next version of ES, field data is no longer
referred to as a cache.
Also in the next version, memory usage for multi-valued fields is much
better than in the current version. All I can suggest for now is to add
nodes (or RAM).
Hey Clinton,
thanks for the reply. Being considered as "cache" or not, that should
still be reported by the stats, right? I mean, I get around 3gb reported as
being used for field data and 1gb for filter cache, but we have a total of
40gb of heap.
Regarding 0.21, we are already preparing our systems to try that, since
we are still not live and can just try it. Anyway, if you have any
suggestion on where to look for the "rest" of the memory...
thanks for your reply. I have read that before, and even though I
also experience some high memory usage, that isn't really a problem
for me(even though it would be nicer having less memory requirements).
I'm more worried now about the fact that the memory seems to be
building up, and that for each iteration of the CMS GC, less memory is
actually freed up, until it runs really tight on memory and eventually
collapses due to really long GC pauses. Thanks anyway
Note that the field data "cache" isn't really a cache. It doesn't get
freed because, chances are, you're just going to need that data again
the next time you run the query anyway.
For this reason, in the next version of ES, field data is no longer
referred to as a cache.
Also in the next version, memory usage for multi-valued fields is much
better than in the current version. All I can suggest for now is to add
nodes (or RAM).
thanks for the reply. Being considered as "cache" or not, that
should still be reported by the stats, right? I mean, I get around 3gb
reported as being used for field data and 1gb for filter cache, but we
have a total of 40gb of heap.
OK - if it is reporting it as 3GB then that should be all there is that
is being used for field values (you haven't got soft refs turned on,
have you).
Btw, you really don't want to use 40GB of heap. Below 32GB Java can use
compressed pointers. Above that and you're wasting space and making GC
more difficult.
Are you using mmapfs? If not, consider doing that and reducing your heap
size.
We tried 30GB but we ran into the same problem, that's why we actually
incremented to 40gb. About the mmapfs, isnt nio the recommended one? Any
reason to believe nio might have a problem?
thanks for the reply. Being considered as "cache" or not, that
should still be reported by the stats, right? I mean, I get around 3gb
reported as being used for field data and 1gb for filter cache, but we
have a total of 40gb of heap.
OK - if it is reporting it as 3GB then that should be all there is that
is being used for field values (you haven't got soft refs turned on,
have you).
Btw, you really don't want to use 40GB of heap. Below 32GB Java can use
compressed pointers. Above that and you're wasting space and making GC
more difficult.
Are you using mmapfs? If not, consider doing that and reducing your heap
size.
On Thu, 2013-02-21 at 16:08 +0100, Leonardo Menezes wrote:
We tried 30GB but we ran into the same problem, that's why we actually
incremented to 40gb. About the mmapfs, isnt nio the recommended one?
Any reason to believe nio might have a problem?
You're on 64 bit?
If so, give this a read:
but either way, you don't want your heap above 30GB. Rather leave the
rest of the RAM for your file system caches.
Clinton,
excuse my french, but you are a fucking genius
Changing the store type to mmapfs radically changed the memory buildup.
It's still a bit soon to say if it really fixed the problem, but We have a
test that would consistently make the problem appear in about 5h, and now
it's been running for more than 24h and still ok.
I dont know if other people experienced that before, but maybe mmap
should be the default, or at least, recommended setting for 64bit systems?
thanks again
On Thu, 2013-02-21 at 16:08 +0100, Leonardo Menezes wrote:
We tried 30GB but we ran into the same problem, that's why we actually
incremented to 40gb. About the mmapfs, isnt nio the recommended one?
Any reason to believe nio might have a problem?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.