Running term_stats facets on nested mapping objects: out of memory error


(drv_) #1

I'm having trouble getting term_stats facets to run on documents that
contain nested objects in the parent. The index is about 120GB in
size, and there are around 50-100 nested objects per parent. I
consistently get a Java heap space error, even when the query by
itself only returns results in the low teens. I can run statistical
facets on root object properties, but not on the nested objects.
There are around 300k unique process names, and I'm usually winnowing
that down to around 100k.

ElasticSearch 0.17.4 is set to use 40GB of memory, it's a single node.

My questions are:

  1. Am I misunderstanding how facets run? I thought facets would only
    execute on the results produced by the query it's attached to.
  2. I should be able to get around this with a script_value in the
    facet, but what would the syntax for selecting nested document
    properties be?

gist of the general document and the types of queries I'm trying to
run:

Thanks for any help or pointers!
scott


(Shay Banon) #2

Can you try and not map the nested object to be included in the parent? This
will cause more memory used when loading the values for the field for the
term stats.

On Fri, Aug 12, 2011 at 3:23 AM, drv_ scottyahoo@gmail.com wrote:

I'm having trouble getting term_stats facets to run on documents that
contain nested objects in the parent. The index is about 120GB in
size, and there are around 50-100 nested objects per parent. I
consistently get a Java heap space error, even when the query by
itself only returns results in the low teens. I can run statistical
facets on root object properties, but not on the nested objects.
There are around 300k unique process names, and I'm usually winnowing
that down to around 100k.

ElasticSearch 0.17.4 is set to use 40GB of memory, it's a single node.

My questions are:

  1. Am I misunderstanding how facets run? I thought facets would only
    execute on the results produced by the query it's attached to.
  2. I should be able to get around this with a script_value in the
    facet, but what would the syntax for selecting nested document
    properties be?

gist of the general document and the types of queries I'm trying to
run:
https://gist.github.com/1141158

Thanks for any help or pointers!
scott


(drv_) #3

I'll try that, thanks for the suggestion Shay. I'll let you know how
it works after loading finishes. :slight_smile:

Did the nested facet query look right? No obvious errors? I guess
I'm still confused about how facets are executed, I thought they would
run against just the result set returned by the attached query, is
that correct? Thanks!

On Aug 12, 7:34 am, Shay Banon kim...@gmail.com wrote:

Can you try and not map the nested object to be included in the parent? This
will cause more memory used when loading the values for the field for the
term stats.

On Fri, Aug 12, 2011 at 3:23 AM, drv_ scottya...@gmail.com wrote:

I'm having trouble getting term_stats facets to run on documents that
contain nested objects in the parent. The index is about 120GB in
size, and there are around 50-100 nested objects per parent. I
consistently get a Java heap space error, even when the query by
itself only returns results in the low teens. I can run statistical
facets on root object properties, but not on the nested objects.
There are around 300k unique process names, and I'm usually winnowing
that down to around 100k.

ElasticSearch 0.17.4 is set to use 40GB of memory, it's a single node.

My questions are:

  1. Am I misunderstanding how facets run? I thought facets would only
    execute on the results produced by the query it's attached to.
  2. I should be able to get around this with a script_value in the
    facet, but what would the syntax for selecting nested document
    properties be?

gist of the general document and the types of queries I'm trying to
run:
https://gist.github.com/1141158

Thanks for any help or pointers!
scott


(Shay Banon) #4

The facet will only run against the result of the query, but, all the values
for those fields will be loaded to memory for fast facet calculation.

On Fri, Aug 12, 2011 at 9:29 PM, drv_ scottyahoo@gmail.com wrote:

I'll try that, thanks for the suggestion Shay. I'll let you know how
it works after loading finishes. :slight_smile:

Did the nested facet query look right? No obvious errors? I guess
I'm still confused about how facets are executed, I thought they would
run against just the result set returned by the attached query, is
that correct? Thanks!

On Aug 12, 7:34 am, Shay Banon kim...@gmail.com wrote:

Can you try and not map the nested object to be included in the parent?
This
will cause more memory used when loading the values for the field for the
term stats.

On Fri, Aug 12, 2011 at 3:23 AM, drv_ scottya...@gmail.com wrote:

I'm having trouble getting term_stats facets to run on documents that
contain nested objects in the parent. The index is about 120GB in
size, and there are around 50-100 nested objects per parent. I
consistently get a Java heap space error, even when the query by
itself only returns results in the low teens. I can run statistical
facets on root object properties, but not on the nested objects.
There are around 300k unique process names, and I'm usually winnowing
that down to around 100k.

ElasticSearch 0.17.4 is set to use 40GB of memory, it's a single node.

My questions are:

  1. Am I misunderstanding how facets run? I thought facets would only
    execute on the results produced by the query it's attached to.
  2. I should be able to get around this with a script_value in the
    facet, but what would the syntax for selecting nested document
    properties be?

gist of the general document and the types of queries I'm trying to
run:
https://gist.github.com/1141158

Thanks for any help or pointers!
scott


(system) #5