My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.
In a test cluster I ran a few experiments. After a jmap i've
determined that the org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.
I understand that percolate queries are all always kept in memory http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through jmap
output and by just watching the heap size before and after opening the
index with the queries.
My test cluster consists of a single node (v1.0.1) and the index I'm
storing percolator queries in has 5 shards and 0 replicas.
Question
A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?
To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:
Going to try and keep this concise.
Issue (Potential bug?)
My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.
In a test cluster I ran a few experiments. After a jmap i've
determined that the org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.
I understand that percolate queries are all always kept in memory http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through jmap
output and by just watching the heap size before and after opening the
index with the queries.
My test cluster consists of a single node (v1.0.1) and the index I'm
storing percolator queries in has 5 shards and 0 replicas.
Question
A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?
Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:
To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:
Going to try and keep this concise.
Issue (Potential bug?)
My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.
In a test cluster I ran a few experiments. After a jmap i've
determined that the org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.
I understand that percolate queries are all always kept in memory http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through
jmap output and by just watching the heap size before and after opening the
index with the queries.
My test cluster consists of a single node (v1.0.1) and the index
I'm storing percolator queries in has 5 shards and 0 replicas.
Question
A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?
You may want to raise an issue on github if you are still concerned.
This is a community mailing list so we answer as best we can when we can
and it may just be that no one else has seen this situation
Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:
To clarify, the "documents" I'm referring to as being stored in "the
index I'm percolating against" are my .percolator indexed queries, and
there are no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:
Going to try and keep this concise.
Issue (Potential bug?)
My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.
In a test cluster I ran a few experiments. After a jmap i've
determined that the org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.
I understand that percolate queries are all always kept in memory http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on
disk and taking up ~3Gb in memory. I've determined this ratio
through jmap output and by just watching the heap size before and after
opening the index with the queries.
My test cluster consists of a single node (v1.0.1) and the index
I'm storing percolator queries in has 5 shards and 0 replicas.
Question
A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?
I experienced the same issue since few days ! With a large data cluster. Desactivating percolator queries reduced immediatly thé garbage collector issue.
I opened an issue on github to know more about that, cause i really need this awsome functionnality !
I spent lot of time trying to optimize garbage for nothing...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.