The difference between range query and range filter?


(mp2893) #1

Hi,

I am currently making a lot of queries based on text and dates.

I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}

But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}

Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.

Ed


Useful examples for the range query (vs. range filter)
(Shay Banon) #2

Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.

On Wed, May 2, 2012 at 7:39 PM, mp2893 mp2893@gmail.com wrote:

Hi,

I am currently making a lot of queries based on text and dates.

I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}

But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}

Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.

Ed


(mp2893) #3

Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the
two types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?

Regards,
Ed

2012/5/3 Shay Banon kimchy@gmail.com

Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.

On Wed, May 2, 2012 at 7:39 PM, mp2893 mp2893@gmail.com wrote:

Hi,

I am currently making a lot of queries based on text and dates.

I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}

But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}

Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.

Ed


(Ivan Brusic) #4

Filters also do not contribute to the scoring of a document, whereas
additional clauses to a query will.

--
Ivan

On Wed, May 2, 2012 at 6:41 PM, edward choi mp2893@gmail.com wrote:

Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the two
types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?

Regards,
Ed

2012/5/3 Shay Banon kimchy@gmail.com

Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.

On Wed, May 2, 2012 at 7:39 PM, mp2893 mp2893@gmail.com wrote:

Hi,

I am currently making a lot of queries based on text and dates.

I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}

But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}

Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.

Ed


(mp2893) #5

So for strict filtering function, I should use filters rather than combined
queries.
Thanks for the info Ivan.

Best,
Ed

2012/5/4 Ivan Brusic ivan@brusic.com

Filters also do not contribute to the scoring of a document, whereas
additional clauses to a query will.

--
Ivan

On Wed, May 2, 2012 at 6:41 PM, edward choi mp2893@gmail.com wrote:

Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the
two
types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?

Regards,
Ed

2012/5/3 Shay Banon kimchy@gmail.com

Usually, the filtered option is better. The filtered option will cache
the

results (in an optimized manner), so if another range filter repeats
with

the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic
memory

management (by default, 20% from the heap). Node stats / Indices stats
can

return you its utilization.

On Wed, May 2, 2012 at 7:39 PM, mp2893 mp2893@gmail.com wrote:

Hi,

I am currently making a lot of queries based on text and dates.

I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}

But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}

Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.

Ed


(system) #6