Memory Usage in (Date) Histogram Facet


(Otis Gospodnetić) #1

Hi,

Reading "Memory Considerations" section at the bottom of
http://www.elasticsearch.org/guide/reference/api/search/facets/histogram-facet.html ....
and want to confirm:

Just because you use the interval functionality to group values from a
field in buckets does not reduce the amount of memory needed.
For example, if there are 1000 distinct values in field X, if you then do a
histogram facet on X and bucket values into 100 buckets, the memory needed
to hold this is still the same as if bu interval was used for bucketing.

In other words, even if you "bucketize", it is worth reducing the number of
distinct values in a field by, for example, rounding up timestamps to
minutes or hours or even days.

Is the above all correct?

Thanks,
Otis

Performance Monitoring for ES -
http://sematext.com/spm/elasticsearch-performance-monitoring


(Drew Raines) #2

Otis Gospodnetic wrote:

Reading "Memory Considerations" section at the bottom
of http://www.elasticsearch.org/guide/reference/api/search/facets/histogram-
facet.html .... and want to confirm:

Just because you use the interval functionality to group values from a
field in buckets does not reduce the amount of memory needed.

[...]

In other words, even if you "bucketize", it is worth reducing the
number of distinct values in a field by, for example, rounding up
timestamps to minutes or hours or even days.

Is the above all correct?

You are correct. Regardless of the interval, FieldDataLoader.load()
will still populate a sparse matrix for every lucene segment defined
by maximum docs in the segment by the number of unique terms in the
field across all documents. And it gets worse once those segments
are merged into fewer ones.

https://github.com/elasticsearch/elasticsearch/issues/1531
https://github.com/elasticsearch/elasticsearch/issues/1683

-Drew


(system) #3