Queries vs filters


(Paul Loy) #1

Hi all,

so it's been mentioned before that filters are more optimal than queries. My
particular use-case is that I want a random ordered list of items that have
a flag set. So I do something like this:

{
"filtered" : {
"query" : {

        "custom_score" : {
            "query" : "match_all",
            "script" : "random()"
        }

    },
    "filter" : {
        "term" : {
            "curated" : "1"
        }
    }
}

}

My filler is actually an and_filter with several term filters. I then also
sort on the _score field.

My worry is the match_all. If I understand correctly, this will only be run
on the filtered result set, so the script scoring will only occur on that
subset of the actual index. Is this correct? or rather, is this the best way
to do this?

Many thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #2

Not really sure I understand, your example does not show any match_all query or and filter? (p.s. can you gist the samples?).
On Monday, March 7, 2011 at 6:28 PM, Paul Loy wrote:

Hi all,

so it's been mentioned before that filters are more optimal than queries. My particular use-case is that I want a random ordered list of items that have a flag set. So I do something like this:

{
"filtered" : {
"query" : {

"custom_score" : {
"query" : "match_all",
"script" : "random()"
}

},
"filter" : {
"term" : {
"curated" : "1"
}
}
}
} My filler is actually an and_filter with several term filters. I then also sort on the _score field.

My worry is the match_all. If I understand correctly, this will only be run on the filtered result set, so the script scoring will only occur on that subset of the actual index. Is this correct? or rather, is this the best way to do this?

Many thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Paul Loy) #3

Hey Shay,

there's definitely a match_all in there (I admit I use the Java API rather
than the REST one so I copy/pasted this query together as an example :smiley: ).

I'll try to rephrase the question. I basically have a set of terms I want
documents to match. Once they match those terms, I then want to give them a
random order. I'm doing this by using a CustomScoreQuery[Builder] with a
script of 'random()' then addSort("_score", SortOrder.ASC).

Previously it has been hinted that filters are more efficient (which makes
sense). So I've implemented this set of TermFilter[Builder]s as an
AndFilter[Builder]. So I use a FilteredQuery[Builder] with the
CustomScoreQuery[Builder] as the query and the AndFilter[Builder] as the
filter.

The CustomScoreQuery[Builder] then requires a query. So I use a
MatchAllQuery[Builder] as the query.

Does this sound like the most optimal way to build this query?

Many thanks,

Paul/

On Tue, Mar 8, 2011 at 6:40 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Not really sure I understand, your example does not show any match_all
query or and filter? (p.s. can you gist the samples?).

On Monday, March 7, 2011 at 6:28 PM, Paul Loy wrote:

Hi all,

so it's been mentioned before that filters are more optimal than queries.
My particular use-case is that I want a random ordered list of items that
have a flag set. So I do something like this:

{
"filtered" : {

    "query" : {

        "custom_score" : {
            "query" : "match_all",
            "script" : "random()"
        }

    },
    "filter" : {

        "term" : {
            "curated" : "1"

        }
    }
}

}

My filler is actually an and_filter with several term filters. I then also
sort on the _score field.

My worry is the match_all. If I understand correctly, this will only be run
on the filtered result set, so the script scoring will only occur on that
subset of the actual index. Is this correct? or rather, is this the best way
to do this?

Many thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #4

If you use a match_all filtered query, then all documents are going to be traversed. But, since the filters you use are term filters (and assuming they are good candidates for caching, i.e. value tend to repeat), then all operations will be done in memory, without any disk lookups, which means it will be very fast.

A few notes on your usage, I suggest using sort on score in a DESC order, which is the native ordering of results (requires less memory). Also, I pushed more optimized version of random to master: https://github.com/elasticsearch/elasticsearch/issues/759.

As to the question if thats the best way to accomplish that, I think its a good way to get it. Tell me how the performance is, and if its lacking, we can think if we can optimize it.

-shay.banon
On Tuesday, March 8, 2011 at 12:05 PM, Paul Loy wrote:

Hey Shay,

there's definitely a match_all in there (I admit I use the Java API rather than the REST one so I copy/pasted this query together as an example :smiley: ).

I'll try to rephrase the question. I basically have a set of terms I want documents to match. Once they match those terms, I then want to give them a random order. I'm doing this by using a CustomScoreQuery[Builder] with a script of 'random()' then addSort("_score", SortOrder.ASC).

Previously it has been hinted that filters are more efficient (which makes sense). So I've implemented this set of TermFilter[Builder]s as an AndFilter[Builder]. So I use a FilteredQuery[Builder] with the CustomScoreQuery[Builder] as the query and the AndFilter[Builder] as the filter.

The CustomScoreQuery[Builder] then requires a query. So I use a MatchAllQuery[Builder] as the query.

Does this sound like the most optimal way to build this query?

Many thanks,

Paul/

On Tue, Mar 8, 2011 at 6:40 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Not really sure I understand, your example does not show any match_all query or and filter? (p.s. can you gist the samples?).
On Monday, March 7, 2011 at 6:28 PM, Paul Loy wrote:

Hi all,

so it's been mentioned before that filters are more optimal than queries. My particular use-case is that I want a random ordered list of items that have a flag set. So I do something like this:

{
"filtered" : {
"query" : {

"custom_score" : {
"query" : "match_all",
"script" : "random()"
}

},
"filter" : {
"term" : {
"curated" : "1"
}
}
}
} My filler is actually an and_filter with several term filters. I then also sort on the _score field.

My worry is the match_all. If I understand correctly, this will only be run on the filtered result set, so the script scoring will only occur on that subset of the actual index. Is this correct? or rather, is this the best way to do this?

Many thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(system) #5