Hi
So first, the top level "filter" parameter should only be used when you are
wanting to facet on unfiltered results, but filter the search results. At
all other times, you should use a "filtered" query instead.
That said, your filter on gender probably matches about 25 million
documents, which you are then sorting. I'm guessing that the sort is
taking a disproportional amount of time.
Normally, a filtered query will try to apply the filters before running the
query. In your example where your filter matches lots of documents, if you
were to combine that with a simple query (eg { match: { name: "ezekiel"}})
then this query may actually be faster than the filter, as "ezekiel" is
likely to appear in far fewer documents than gender "m". The filtered
query does try to detect these anomalies, but this can also be controlled
by the undocumented "strategy" parameter.
However, queries usually come from users, and it is difficult to know in
advance if they are going to be simple (and fast) or complex (and slower)
queries. Using the default strategy for filtered at least gives you some
consistency in response times, by reducing the total number of docs that
the query has to examine.
Your example where you include several must clauses is doing just that -
reducing the total number of docs that the query needs to examine by a much
larger percentage than your first query.
Note: all cached filters are the same size. it doesn't depend on how many
docs match or not. It uses a bitset to represent every doc in the index,
with each bit set to either 1 or 0
Clint
On 10 July 2013 20:56, Martin Konecny martin.konecny@gmail.com wrote:
Consider the following simple query
{
"query":{
"match_all":{
}
},
"filter":{
"bool":{
"must":[
{
"terms":{
"gender":[
"m"
]
}
}
]
}
},
"sort":[
{
"sub":{
"order":"desc"
}
}
],
"from":10,
"size":10
}
This was taking 800ms when run on 50 million records. I tried speeding
things up using "filtered", but the response time remains the same:
{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"bool":{
"must":[
{
"terms":{
"gender":[
"m"
]
}
}
]
}
}
}
},
"sort":[
{
"sub":{
"order":"desc"
}
}
],
"from":10,
"size":10
}
Note that in these two queries, the "must" parameter is:
[{"terms":{"gender":["m","f",""]}}]
If I increase the "must" parameters to
[
{"term":{"sr_loc":"1"}},
{"range":{"birth_es_date":{"from":"19770101","to":"19970527"}}},
{"term":{"loc":"SA"}},
{"terms":{"gender":["f"]}}
]
Then there is a huge difference between the before and after "filtered"
optimization (drops from 800ms to 30).
Is it because the simpler "must" parameter returns a much larger result
set which cannot be cached?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.