RangeFilter on Numeric Fields - Caching Questions


(aditya tripathi) #1

Hi,

We have a piece of code which uses RangeFilter for a numeric field. (Changing it to NumericRangeFilter now).
We also do sorting on the same numeric field. So probably it gets converted to a NumericRangeFilter.

eg rangeFilter:
{
"range" : {
"postDateTime" : {
"from" : 0,
"to" : 120,
"include_lower" : false,
"include_upper" : false
}
}

Q1:
Does this RangeFilter automatically get converted to a NumericRangeFilter?
(asking so, because of the number of instances of NumericRangeFilter in the memory profile of ES)

Q2:
I had a question about caching part of this filter.
Since RageFilters are cached automatically, even if it gets converted to NumericRangeFilter, will it still be cached?

Q3:
Assuming it gets cached.
Even if it is not cached, I still want to ask this question for the case where the above range filter does not get converted to a numeric filter.

How will the filter cache size grow with changing "to" values in the above filter.?

Assume that there are 100 docs in the index.
First range filter results in 50 docs.
Second range filter (with a different "to" value, say 125) results in 51 docs. Out of which, previous 50 are common.

Will the cache structure be like:
key1 -> bitset1
key2 -> bitset2

Where key1 is of the form : 0-120. And key2 is of the form: 0-125.

Where both bitset1 and bitset2 are of size 100.

Q4: Is there any optimization on number of these filter instances.
For eg, if a new range filter is to be executed whose range is 0-122, Will it go as a new cache key and value containing a new bitset? Or, this rangeFilter will be considered as a "cache-hit"?

Q5:
For the case where every query fired has a different "to" value of the rangeFilter, can we say that cache hit rate will be very low and it is not useful to cache this?

Q5:
Considering the case, where the field postDateTime is also used for sorting, so it is available in the field cache.

Which will be faster?
Using RangeFilter with cache set as false (Fetch from Trie based index) OR using NumericRangeFilter (Passing through all docs)?

Obviously, if the rangeFilter fetches the DocIdPostingList from the disk it will be slow compared to field cache available for NumericRangeFilter - but is that the case?

Q6:
If a NumericRangeFilter is ANDed or ORed with other filter, say a TermFilter. What will be the execution of the filters. Will the TermFilter be always executed before the NumericRangeFilter, as the Docs to pass through will be less in this case.

Thanks for your patience - if you reached this far :slight_smile:

-Aditya.


(Shay Banon) #2

There are two ways to do range filters on numeric. The first is the regular
range filter (which translates to Lucene NumericRangeFilter), and it gets
cached by default. The second is the numeric range filter, which works on
the field data cache by comparing the data for the field in memory for each
hit.

On Fri, Apr 27, 2012 at 3:35 PM, aditya tripathi
aditya.tripathi@gmail.comwrote:

Hi,

We have a piece of code which uses RangeFilter for a numeric field.
(Changing it to NumericRangeFilter now).
We also do sorting on the same numeric field. So probably it gets converted
to a NumericRangeFilter.

eg rangeFilter:
{
"range" : {
"postDateTime" : {
"from" : 0,
"to" : 120,
"include_lower" : false,
"include_upper" : false
}
}

Q1:
Does this RangeFilter automatically get converted to a NumericRangeFilter?
(asking so, because of the number of instances of NumericRangeFilter in the
memory profile of ES)

Q2:
I had a question about caching part of this filter.
Since RageFilters are cached automatically, even if it gets converted to
NumericRangeFilter, will it still be cached?

Q3:
Assuming it gets cached.
Even if it is not cached, I still want to ask this question for the case
where the above range filter does not get converted to a numeric filter.

How will the filter cache size grow with changing "to" values in the above
filter.?

Assume that there are 100 docs in the index.
First range filter results in 50 docs.
Second range filter (with a different "to" value, say 125) results in 51
docs. Out of which, previous 50 are common.

Will the cache structure be like:
key1 -> bitset1
key2 -> bitset2

Where key1 is of the form : 0-120. And key2 is of the form: 0-125.

Where both bitset1 and bitset2 are of size 100.

Q4: Is there any optimization on number of these filter instances.
For eg, if a new range filter is to be executed whose range is 0-122, Will
it go as a new cache key and value containing a new bitset? Or, this
rangeFilter will be considered as a "cache-hit"?

Q5:
For the case where every query fired has a different "to" value of the
rangeFilter, can we say that cache hit rate will be very low and it is not
useful to cache this?

Q5:
Considering the case, where the field postDateTime is also used for
sorting,
so it is available in the field cache.

Which will be faster?
Using RangeFilter with cache set as false (Fetch from Trie based index) OR
using NumericRangeFilter (Passing through all docs)?

Obviously, if the rangeFilter fetches the DocIdPostingList from the disk it
will be slow compared to field cache available for NumericRangeFilter - but
is that the case?

Q6:
If a NumericRangeFilter is ANDed or ORed with other filter, say a
TermFilter. What will be the execution of the filters. Will the TermFilter
be always executed before the NumericRangeFilter, as the Docs to pass
through will be less in this case.

Thanks for your patience - if you reached this far :slight_smile:

-Aditya.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/RangeFilter-on-Numeric-Fields-Caching-Questions-tp3944196p3944196.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #3