Just because you use the interval functionality to group values from a
field in buckets does not reduce the amount of memory needed.
For example, if there are 1000 distinct values in field X, if you then do a
histogram facet on X and bucket values into 100 buckets, the memory needed
to hold this is still the same as if bu interval was used for bucketing.

In other words, even if you "bucketize", it is worth reducing the number of
distinct values in a field by, for example, rounding up timestamps to
minutes or hours or even days.

Just because you use the interval functionality to group values from a
field in buckets does not reduce the amount of memory needed.

[...]

In other words, even if you "bucketize", it is worth reducing the
number of distinct values in a field by, for example, rounding up
timestamps to minutes or hours or even days.

Is the above all correct?

You are correct. Regardless of the interval, FieldDataLoader.load()
will still populate a sparse matrix for every lucene segment defined
by maximum docs in the segment by the number of unique terms in the
field across all documents. And it gets worse once those segments
are merged into fewer ones.

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.