Percolator Memory Usage -- 10-1 Disk-Memory Usage. Why?

Adam_Georgiou · July 11, 2014, 10:21pm

Going to try and keep this concise.

Issue (Potential bug?)

My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.
In a test cluster I ran a few experiments. After a jmap i've
determined that the
org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.
I understand that percolate queries are all always kept in memory
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through jmap
output and by just watching the heap size before and after opening the
index with the queries.
My test cluster consists of a single node (v1.0.1) and the index I'm
storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam_Georgiou_2 · July 11, 2014, 10:36pm

To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.

In a test cluster I ran a few experiments. After a jmap i've
determined that the
org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.

I understand that percolate queries are all always kept in memory
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through jmap
output and by just watching the heap size before and after opening the
index with the queries.

My test cluster consists of a single node (v1.0.1) and the index I'm
storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF4J1at2fcdeySbPQ6hYjsp1YXVJH7%2BJE5U_W-rvTU6Y1emdBA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Adam_Georgiou_2 · July 15, 2014, 8:00pm

Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:

To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.

In a test cluster I ran a few experiments. After a jmap i've
determined that the
org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.

I understand that percolate queries are all always kept in memory
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on disk
and taking up ~3Gb in memory. I've determined this ratio through
jmap output and by just watching the heap size before and after opening the
index with the queries.

My test cluster consists of a single node (v1.0.1) and the index
I'm storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · July 15, 2014, 10:25pm

You may want to raise an issue on github if you are still concerned.
This is a community mailing list so we answer as best we can when we can
and it may just be that no one else has seen this situation

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 16 July 2014 06:00, Adam Georgiou me@adamgeorgiou.com wrote:

Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:

To clarify, the "documents" I'm referring to as being stored in "the
index I'm percolating against" are my .percolator indexed queries, and
there are no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

My cluster has been running into memory issues; garbage collection
loops, stopping the world, etc.

In a test cluster I ran a few experiments. After a jmap i've
determined that the
org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
nearly 40% of my heap, even though my percolator queries are a fraction of
the size of the *regular *documents I'm storing.

I understand that percolate queries are all always kept in memory
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
and I'm trying to plan accordingly, but to put things in perspective the
index I'm percolating on contains *documents that are ~*317M on
disk and taking up ~3Gb in memory. I've determined this ratio
through jmap output and by just watching the heap size before and after
opening the index with the queries.

My test cluster consists of a single node (v1.0.1) and the index
I'm storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aw%3DCyJ3vKdErSVwzDNNnQhjd6dXkp2%2BjtUM4NzpE%3DxHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

kaliseo · January 16, 2015, 6:32pm

I experienced the same issue since few days ! With a large data cluster. Desactivating percolator queries reduced immediatly thé garbage collector issue.

I opened an issue on github to know more about that, cause i really need this awsome functionnality !

I spent lot of time trying to optimize garbage for nothing...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c31fad76-b77e-4699-9eb4-af855934530a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Slow Percolator Indexing Elasticsearch	10	635	July 6, 2017
Percolator performance ideas Elasticsearch	6	519	July 6, 2017
Memory usage per index Elasticsearch	9	10258	July 6, 2017
Memory Usage Elasticsearch	5	456	July 6, 2017
Percolate queries and node cpu usage Elasticsearch	1	506	July 5, 2017

Percolator Memory Usage -- 10-1 Disk-Memory Usage. Why?

Related topics