I would like to facet on a user defined "keyword" field, but I know that
high cardinality fields can cause high memory usage in elasticsearch.
I see starting in 0.90 there is an indices.fielddata.cache.size
parameter. If this is bounded, will it prevent all OOM issues or if the
values for a single field can't fit in memory will it still cause an OOM?
Also is it still true that all facet values are loaded into memory
upon facet execution? If so, what is the purpose of the fielddata filter?
Wouldn't the values filtered from memory need to be accessed on disk again
at some point anyways? Or the filtered values are simply excluded from
consideration during facet execution?
I see some older posts mentioned memory inefficiencies for multi
valued fields. Is this still the case in 0.90?
The indices.fielddata.cache.size parameter indeed controls the size of the
cache. It does not however prevents the loading of data into memory if a
request needs it. In 1.0 a circuit breaker was introduced to fail requests
that tries to load too much into memory
( Elasticsearch Platform — Find real-time answers at scale | Elastic
).
In general, once you facet on a field, all of its values are loaded into
memory. This is based on the assumption that this field will be repeatedly
used for different request needing various values. Some times not all
values are interesting (think about a very common keyword, or some maybe
you want to exclude all values that start with a dot). You can use filters
to indicate those should not be used and not loaded into memory and
subsequent requests will not cause them to be loaded either.
I'm not sure which posts you refer to but indeed 0.90 has massively
improved memory usage w.r.t previous versions.
Cheers,
Boaz
On Thursday, February 6, 2014 12:42:31 AM UTC+1, slushi wrote:
I would like to facet on a user defined "keyword" field, but I know that
high cardinality fields can cause high memory usage in elasticsearch.
I see starting in 0.90 there is an indices.fielddata.cache.size
parameter. If this is bounded, will it prevent all OOM issues or if the
values for a single field can't fit in memory will it still cause an OOM?
Also is it still true that all facet values are loaded into memory
upon facet execution? If so, what is the purpose of the fielddata filter?
Wouldn't the values filtered from memory need to be accessed on disk again
at some point anyways? Or the filtered values are simply excluded from
consideration during facet execution?
I see some older posts mentioned memory inefficiencies for multi
valued fields. Is this still the case in 0.90?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.