Apply query after filter


(George Sakkis) #1

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George


(Shay Banon) #2

The actual execution is interleaving, so you are good.

On Fri, Dec 16, 2011 at 12:01 PM, George Sakkis george.sakkis@gmail.comwrote:

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George


(George Sakkis) #3

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

{
"query": {
"filtered": {
"filter": {
"terms": {
"user_id": [
291773
]
}
},
"query": {
"prefix": {
"_all": {
"prefix": "m"
}
}
}
}
}
}

Replacing the prefix query with a match_all returns 147 hits in only
5ms!

{
"query": {
"filtered": {
"filter": {
"terms": {
"app_id": [
291773
]
}
},
"query": {
"match_all": {}
}
}
}
}

The workaround I resorted to for now is to use a PrefixFilter instead
of PrefixQuery and rely on caching but ideally I would like to use a
query and specify/override the execution order. Any way to force the
query to run on the result set of the filter?

On Dec 16, 6:05 pm, Shay Banon kim...@gmail.com wrote:

The actual execution is interleaving, so you are good.

On Fri, Dec 16, 2011 at 12:01 PM, George Sakkis george.sak...@gmail.comwrote:

Hi all,

according to the docs a FilteredQuery is a query that applies a filter
to the results of another query. Is there a way to reverse the order
so that the subquery runs on the results of the filter? At least in my
case this would have big performance impact as the filter limits the
candidate set to a few hundreds docs down from millions.

Thanks,
George


(Clinton Gormley) #4

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

  • load all terms
  • find all terms beginning with 'm'
  • limit those to the 1024 most relevant 'm' terms
  • do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

What you want to do is to prepare your data properly, ie index
appropriate fields with the 'edgeNGram' token filter, which will produce
terms like:

'm','ma','mar','mary'

Then a search for eg 'ma' will be quick.

Note: your search_analayzer should NOT include the edgeNGram filter,
otherwise it'll search for 'm','ma' etc

have a look at this previous reply for more detail:

http://elasticsearch-users.115913.n3.nabble.com/help-needed-with-the-query-tt3177477.html#a3178856

clint


(George Sakkis) #5

Hi Clinton,

On Dec 19, 10:58 am, Clinton Gormley cl...@traveljury.com wrote:

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

  • load all terms
  • find all terms beginning with 'm'
  • limit those to the 1024 most relevant 'm' terms
  • do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

The thing is it's not general use in my case; ideally the prefix
search would be applied to only the 147 documents that match the other
filter, if only I could specify the execution order.

George


(Shay Banon) #6

Thats not really how prefix / wildcard queries work, in Lucene, there is a
rewrite phase for queries that happens before the actual query execution,
and in this phase prefix/wildcard can be expensive.

On Mon, Dec 19, 2011 at 3:01 PM, George Sakkis george.sakkis@gmail.comwrote:

Hi Clinton,

On Dec 19, 10:58 am, Clinton Gormley cl...@traveljury.com wrote:

Hi George

On Mon, 2011-12-19 at 01:22 -0800, George Sakkis wrote:

Unfortunately this doesn't seem to be enough, at least for broad
prefix queries there is a two orders of magnitude difference in my
index. Here's my original query, it takes ~650ms for 25 hits:

The problem with your query is not filter vs query, it is the use of the
prefix query (or filter for that matter).

It is not an efficient search. It has to:

  • load all terms
  • find all terms beginning with 'm'
  • limit those to the 1024 most relevant 'm' terms
  • do 1024 searches

prefix clauses are ok when you have few terms, but not for general use.

The thing is it's not general use in my case; ideally the prefix
search would be applied to only the 147 documents that match the other
filter, if only I could specify the execution order.

George


(system) #7