I am currently making a lot of queries based on text and dates.
I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}
But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}
Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.
Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.
I am currently making a lot of queries based on text and dates.
I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}
But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}
Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.
Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the
two types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?
Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.
I am currently making a lot of queries based on text and dates.
I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}
But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}
Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.
Filters also do not contribute to the scoring of a document, whereas
additional clauses to a query will.
--
Ivan
On Wed, May 2, 2012 at 6:41 PM, edward choi mp2893@gmail.com wrote:
Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the two
types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?
Usually, the filtered option is better. The filtered option will cache the
results (in an optimized manner), so if another range filter repeats with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic memory
management (by default, 20% from the heap). Node stats / Indices stats can
return you its utilization.
I am currently making a lot of queries based on text and dates.
I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}
But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}
Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.
Filters also do not contribute to the scoring of a document, whereas
additional clauses to a query will.
--
Ivan
On Wed, May 2, 2012 at 6:41 PM, edward choi mp2893@gmail.com wrote:
Thanks, Shay for the detailed answer.
So, if I understood correctly, basically the only difference between the
two
types of query is whether the results are cached or not?
Then if queries are consecutively made with totally non-overlapping date
ranges, the two types of query will give me the same performance?
Usually, the filtered option is better. The filtered option will cache
the
results (in an optimized manner), so if another range filter repeats
with
the same ranges, the data will already be available from the cache. The
cache is called the filter cache, and its LRU based with automatic
memory
management (by default, 20% from the heap). Node stats / Indices stats
can
return you its utilization.
I am currently making a lot of queries based on text and dates.
I use bool query that includes a text query and a range query.
Text query and range query are combine with a "must".
For example,
{
"query":{
"bool":{
"must":[
{"text":{"title":"my sample query"}},
{"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}}
]
}
}
}
But I found out that the same goal could be achieve by a filtered
query with a range filter.
For example,
{
"query":{
"filtered":{
"filter":{
"range":{"date":{"gte":"2012-05-01", "lte":"2012-05-05"}}
},
"query":{
"text":{"title":"my sample query"}
}
}
}
}
Which one is the better way to go?
I'd appreciate any kind of advice.
Thanks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.