Apply query after filter

George_Sakkis · December 16, 2011, 10:01am

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George

kimchy · December 16, 2011, 5:05pm

The actual execution is interleaving, so you are good.

On Fri, Dec 16, 2011 at 12:01 PM, George Sakkis george.sakkis@gmail.comwrote:

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George

George_Sakkis · December 19, 2011, 9:22am

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

{
"query": {
"filtered": {
"filter": {
"terms": {
"user_id": [
291773
]
}
},
"query": {
"prefix": {
"_all": {
"prefix": "m"
}
}
}
}
}
}

Replacing the prefix query with a match_all returns 147 hits in only
5ms!

{
"query": {
"filtered": {
"filter": {
"terms": {
"app_id": [
291773
]
}
},
"query": {
"match_all": {}
}
}
}
}

The workaround I resorted to for now is to use a PrefixFilter instead
of PrefixQuery and rely on caching but ideally I would like to use a
query and specify/override the execution order. Any way to force the
query to run on the result set of the filter?

On Dec 16, 6:05 pm, Shay Banon kim...@gmail.com wrote:

The actual execution is interleaving, so you are good.

On Fri, Dec 16, 2011 at 12:01 PM, George Sakkis george.sak...@gmail.comwrote:

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George

Clinton_Gormley · December 19, 2011, 9:58am

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

load all terms
find all terms beginning with 'm'
limit those to the 1024 most relevant 'm' terms
do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

What you want to do is to prepare your data properly, ie index
appropriate fields with the 'edgeNGram' token filter, which will produce
terms like:

'm','ma','mar','mary'

Then a search for eg 'ma' will be quick.

Note: your search_analayzer should NOT include the edgeNGram filter,
otherwise it'll search for 'm','ma' etc

have a look at this previous reply for more detail:

http://elasticsearch-users.115913.n3.nabble.com/help-needed-with-the-query-tt3177477.html#a3178856

clint

George_Sakkis · December 19, 2011, 1:01pm

Hi Clinton,

On Dec 19, 10:58 am, Clinton Gormley cl...@traveljury.com wrote:

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

load all terms

find all terms beginning with 'm'

limit those to the 1024 most relevant 'm' terms

do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

The thing is it's not general use in my case; ideally the prefix
search would be applied to only the 147 documents that match the other
filter, if only I could specify the execution order.

George

kimchy · December 20, 2011, 3:01pm

Thats not really how prefix / wildcard queries work, in Lucene, there is a
rewrite phase for queries that happens before the actual query execution,
and in this phase prefix/wildcard can be expensive.

On Mon, Dec 19, 2011 at 3:01 PM, George Sakkis george.sakkis@gmail.comwrote:

Hi Clinton,

On Dec 19, 10:58 am, Clinton Gormley cl...@traveljury.com wrote:

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

load all terms

find all terms beginning with 'm'

limit those to the 1024 most relevant 'm' terms

do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

The thing is it's not general use in my case; ideally the prefix
search would be applied to only the 147 documents that match the other
filter, if only I could specify the execution order.

George

Topic		Replies	Views
How to specify execution order of filter and query? Elasticsearch	7	2590	July 6, 2017
Elasticsearch Filter And Query Elasticsearch	10	448	July 6, 2017
Can the order of filters impact performance? Elasticsearch	5	1916	July 6, 2017
Filtered dsl return different results for num of "query" Elasticsearch	4	390	July 6, 2017
Massive perf difference with filter versus filtered query Elasticsearch	4	605	July 6, 2017

Apply query after filter

Related topics