Percolator Memory Usage -- 10-1 Disk-Memory Usage. Why?

Going to try and keep this concise.

Issue (Potential bug?)

  • My cluster has been running into memory issues; garbage collection
    loops, stopping the world, etc.
  • In a test cluster I ran a few experiments. After a jmap i've
    determined that the
    org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
    nearly 40% of my heap, even though my percolator queries are a fraction of
    the size of the *regular *documents I'm storing.
  • I understand that percolate queries are all always kept in memory
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
    and I'm trying to plan accordingly, but to put things in perspective the
    index I'm percolating on contains *documents that are ~*317M on disk
    and taking up ~3Gb in memory
    . I've determined this ratio through jmap
    output and by just watching the heap size before and after opening the
    index with the queries.
  • My test cluster consists of a single node (v1.0.1) and the index I'm
    storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

  • My cluster has been running into memory issues; garbage collection
    loops, stopping the world, etc.
  • In a test cluster I ran a few experiments. After a jmap i've
    determined that the
    org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
    nearly 40% of my heap, even though my percolator queries are a fraction of
    the size of the *regular *documents I'm storing.
  • I understand that percolate queries are all always kept in memory
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
    and I'm trying to plan accordingly, but to put things in perspective the
    index I'm percolating on contains *documents that are ~*317M on disk
    and taking up ~3Gb in memory
    . I've determined this ratio through jmap
    output and by just watching the heap size before and after opening the
    index with the queries.
  • My test cluster consists of a single node (v1.0.1) and the index I'm
    storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF4J1at2fcdeySbPQ6hYjsp1YXVJH7%2BJE5U_W-rvTU6Y1emdBA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:

To clarify, the "documents" I'm referring to as being stored in "the index
I'm percolating against" are my .percolator indexed queries, and there are
no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

  • My cluster has been running into memory issues; garbage collection
    loops, stopping the world, etc.
  • In a test cluster I ran a few experiments. After a jmap i've
    determined that the
    org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
    nearly 40% of my heap, even though my percolator queries are a fraction of
    the size of the *regular *documents I'm storing.
  • I understand that percolate queries are all always kept in memory
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
    and I'm trying to plan accordingly, but to put things in perspective the
    index I'm percolating on contains *documents that are ~*317M on disk
    and taking up ~3Gb in memory
    . I've determined this ratio through
    jmap output and by just watching the heap size before and after opening the
    index with the queries.
  • My test cluster consists of a single node (v1.0.1) and the index
    I'm storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You may want to raise an issue on github if you are still concerned.
This is a community mailing list so we answer as best we can when we can
and it may just be that no one else has seen this situation :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 16 July 2014 06:00, Adam Georgiou me@adamgeorgiou.com wrote:

Is this consistent with other people's experience? Would some charts --
heap usage, disk usage, etc. -- make this more approachable
On Jul 11, 2014 6:36 PM, "Adam Georgiou" me@adamgeorgiou.com wrote:

To clarify, the "documents" I'm referring to as being stored in "the
index I'm percolating against" are my .percolator indexed queries, and
there are no other documents stored in said index.
On Jul 11, 2014 6:21 PM, "Adam Georgiou" apg552@gmail.com wrote:

Going to try and keep this concise.

Issue (Potential bug?)

  • My cluster has been running into memory issues; garbage collection
    loops, stopping the world, etc.
  • In a test cluster I ran a few experiments. After a jmap i've
    determined that the
    org.elasticsearch.index.percolator.PercolatorQueriesRegistry is taking up
    nearly 40% of my heap, even though my percolator queries are a fraction of
    the size of the *regular *documents I'm storing.
  • I understand that percolate queries are all always kept in memory
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_how_it_works_under_the_hood,
    and I'm trying to plan accordingly, but to put things in perspective the
    index I'm percolating on contains *documents that are ~*317M on
    disk and taking up ~3Gb in memory
    . I've determined this ratio
    through jmap output and by just watching the heap size before and after
    opening the index with the queries.
  • My test cluster consists of a single node (v1.0.1) and the index
    I'm storing percolator queries in has 5 shards and 0 replicas.

Question

A nearly 10-1 ratio of memory usage to disk usage seems wrong to me. Is
there something specific about the way percolator documents are stored
under the hood that makes them take up so much memory compared to the way
their JSON representations are stored on disk?

-Adam

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NRKENFOwmmE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4ee474c6-9aa8-4a50-b140-d30860ff98fa%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF4J1auG3dG%2Bm6aZs0TaqSf%2B4pf7PLct_oLOePpOhv%2B2P-hG%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aw%3DCyJ3vKdErSVwzDNNnQhjd6dXkp2%2BjtUM4NzpE%3DxHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I experienced the same issue since few days ! With a large data cluster. Desactivating percolator queries reduced immediatly thé garbage collector issue.

I opened an issue on github to know more about that, cause i really need this awsome functionnality !

I spent lot of time trying to optimize garbage for nothing...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c31fad76-b77e-4699-9eb4-af855934530a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.