Yet another facet/memory question

I would like to facet on a user defined "keyword" field, but I know that
high cardinality fields can cause high memory usage in elasticsearch.

  • I see starting in 0.90 there is an indices.fielddata.cache.size
    parameter. If this is bounded, will it prevent all OOM issues or if the
    values for a single field can't fit in memory will it still cause an OOM?
  • Also is it still true that all facet values are loaded into memory
    upon facet execution? If so, what is the purpose of the fielddata filter?
    Wouldn't the values filtered from memory need to be accessed on disk again
    at some point anyways? Or the filtered values are simply excluded from
    consideration during facet execution?
  • I see some older posts mentioned memory inefficiencies for multi
    valued fields. Is this still the case in 0.90?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#fielddata-filters

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8c03166-a72d-4053-9af0-6a69e3e30cfb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The indices.fielddata.cache.size parameter indeed controls the size of the
cache. It does not however prevents the loading of data into memory if a
request needs it. In 1.0 a circuit breaker was introduced to fail requests
that tries to load too much into memory
( Elasticsearch Platform — Find real-time answers at scale | Elastic
).

In general, once you facet on a field, all of its values are loaded into
memory. This is based on the assumption that this field will be repeatedly
used for different request needing various values. Some times not all
values are interesting (think about a very common keyword, or some maybe
you want to exclude all values that start with a dot). You can use filters
to indicate those should not be used and not loaded into memory and
subsequent requests will not cause them to be loaded either.

I'm not sure which posts you refer to but indeed 0.90 has massively
improved memory usage w.r.t previous versions.

Cheers,
Boaz

On Thursday, February 6, 2014 12:42:31 AM UTC+1, slushi wrote:

I would like to facet on a user defined "keyword" field, but I know that
high cardinality fields can cause high memory usage in elasticsearch.

  • I see starting in 0.90 there is an indices.fielddata.cache.size
    parameter. If this is bounded, will it prevent all OOM issues or if the
    values for a single field can't fit in memory will it still cause an OOM?
  • Also is it still true that all facet values are loaded into memory
    upon facet execution? If so, what is the purpose of the fielddata filter?
    Wouldn't the values filtered from memory need to be accessed on disk again
    at some point anyways? Or the filtered values are simply excluded from
    consideration during facet execution?
  • I see some older posts mentioned memory inefficiencies for multi
    valued fields. Is this still the case in 0.90?

Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d02a4b6a-cf74-4085-8c7e-6eaf51404a4a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.