Historically, date range searches have been difficult to do right
(performance and memory wise) on top of Lucene. The main reason in the
past has been that a range search was translated into a logical OR of
all the terms that fell into the range. If the range is large enough
(e.g. long time span) you get an OR of many, many terms with the to be
expected impact on runtime. Sometimes you even exceed the max number
of OR clauses.
A common (and good) solution to the date range problem in the past was
to rely on decomposition of the date at indexing time. You would
basically index a date_time into six fields (year, month, ..., second)
and create an appropriate logical AND clause at query time. The main
advantage of this approach is that it limits the number of terms
possible for the otherwise infinite set of millisecond values. The
second advantage is that you can be very flexible about precision
matching. To get all documents of a month you do not even need a range
(but match on the year and month field only). Is there any support for
this type of decomposition within the ES mapping ?
I know that range searches have improved at the Lucene level via the
TrieField concept for numeric fields. Is this used by ES and does this
completely solve the performance issue? How about index size when
storing lots of time stamps with second precision (think about the
number of terms in the index)?