Aggregations

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALOF%3DH5%2BCzGZzhiyzy8ixnY_CcreL_3XaJf9jf4RJTvVH4Jx%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

Sounds like your problem might be your heap size is too low. How much
memory have you assigned to your heap (i.e. what have you set as
ES_HEAP_SIZE)? To perform aggregations, Elasticsearch has to load the
values for a field for every document into memory in a data structure
called field cache. It sounds like you are hitting the circuit breaker
which prevents this data structure using too much of the heap and causing
an OOM error.

Colin

On Wednesday, 3 September 2014 17:58:02 UTC+1, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fc310de-32fc-4b05-b503-db444fb93ca7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thank you for reply ,my heap size is of 8gb for 74 gb index and yes i am
hitting circut breaker
so when i am querying or filtering before aggregations,aggregations are
passed only filtered/query output results ???

On Thursday, September 4, 2014 3:15:43 PM UTC+5:30, Colin Goodheart-Smithe
wrote:

Hi,

Sounds like your problem might be your heap size is too low. How much
memory have you assigned to your heap (i.e. what have you set as
ES_HEAP_SIZE)? To perform aggregations, Elasticsearch has to load the
values for a field for every document into memory in a data structure
called field cache. It sounds like you are hitting the circuit breaker
which prevents this data structure using too much of the heap and causing
an OOM error.

Colin

On Wednesday, 3 September 2014 17:58:02 UTC+1, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3acf461-0b7c-4509-a2fb-0427ab6fc8f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

i am asking because query output or after filtering my output contain very
few entries(in hundreds),so if its is hitting oom error then aggregations
is taking everything into cache irrespective of before query or filtering .

On Wednesday, September 3, 2014 10:28:02 PM UTC+5:30, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8260a08d-d6ef-4bf0-8e2a-eb5096fadfc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What version of es have you been using, afaik in later versions you can
control the percentage of heap space to utilize with update settings api,
try to increase it a bit and see what happens, default is 60%, increase it
for example to 70%:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker

T.

On Wednesday, 3 September 2014 19:58:02 UTC+3, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ca4244c-972e-4adf-bb1d-1ef2134fcdd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sorry for delayed response,
i am using 1.3 version ,i was able to change limit,field data circut
breaker,i changed it to 80 ,this is nice setting to know .
but it doesn't work ,may be heap size is my problem ,but i have very
limited heap space .

Thanks you.

On Friday, September 5, 2014 2:19:25 PM UTC+5:30, Thomas wrote:

What version of es have you been using, afaik in later versions you can
control the percentage of heap space to utilize with update settings api,
try to increase it a bit and see what happens, default is 60%, increase it
for example to 70%:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker

T.

On Wednesday, 3 September 2014 19:58:02 UTC+3, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's aggregation
feature ,i am always hitting data too large,i understand that aggregations
are very memory intensive , so is there any way query in ES where one
query's output can be ingested to aggregation so that number of input to
aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ece31bc-fa05-4c4d-b94a-5af67e2fd8ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Field data does indeed load all the values for a field into memory
irrespective of the query and filter. This is how aggregations achieve
fast lookups on the values of a field for a particular document. The field
cache is loaded the first time it is needed and then stored in a cache.

Heap size is almost certainly your problem here. There are 2 options I can
see for you:

  1. Increase your heap size to allow enough space to load the field cache
    into memory
  2. Try setting the field data format to 'doc_values' (described here
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html).
    Note that doc_values uses less memory but will consume more disk and may be
    slightly slower so may or may not suit your needs.

Regards,

Colin

On Wednesday, 17 September 2014 07:47:27 UTC+1, navdeep agarwal wrote:

Sorry for delayed response,
i am using 1.3 version ,i was able to change limit,field data circut
breaker,i changed it to 80 ,this is nice setting to know .
but it doesn't work ,may be heap size is my problem ,but i have very
limited heap space .

Thanks you.

On Friday, September 5, 2014 2:19:25 PM UTC+5:30, Thomas wrote:

What version of es have you been using, afaik in later versions you can
control the percentage of heap space to utilize with update settings api,
try to increase it a bit and see what happens, default is 60%, increase it
for example to 70%:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker

T.

On Wednesday, 3 September 2014 19:58:02 UTC+3, navdeep agarwal wrote:

hi ,

i am bit new Elastic search ,while testing on elasticsearch's
aggregation feature ,i am always hitting data too large,i understand that
aggregations are very memory intensive , so is there any way query in ES
where one query's output can be ingested to aggregation so that number of
input to aggregation is limited . i have used filter and querying before
aggregations .

i have around 60 GB index on 5 shards .

queries i tried:

GET **********/_search
{
"query": {"term": {
"file_sha2": {
"value": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}},

"aggs": {
"top_filename": {
"max": {
"field": "portalid"
}
}

}
}


GET ************/_search
{

"aggs": {
  "top filename": {
    "filter": {"term": {
      "file_sha2": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }},
    "aggs": {
      "top_filename": {
        "max": {
          "field": "portalid"
        }
      }
    }
  }
}

}

thanks in advance .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f20ced1d-1caf-4000-88e7-07fd23735ea7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.