Filters vs Queries


(davrob) #1

Hi,

I keep hearing that I should be using filters rather than queries
because they are much quicker but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

Here are my questions:

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Best Regards,

David.


(Clinton Gormley) #2

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching on
http://www.elasticsearch.org/guide/reference/query-dsl/ )

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body.html

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

    { query: { text: { _all: "foo bar" }}}

  • Filter only:

    { query: {
    constant_score: {
    filter: { term: { status: "open" }}
    }
    }}

  • Query and Filter:

    { query: {
    filtered: {
    query: { text: { _all: "foo bar"}}
    filter: { term: { status: "open" }}
    }
    }}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}
}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(davrob) #3

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching onhttp://www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:http://www.elasticsearch.org/guide/reference/api/search/request-body....

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

    { query: { text: { _all: "foo bar" }}}

  • Filter only:

    { query: {
    constant_score: {
    filter: { term: { status: "open" }}
    }
    }}

  • Query and Filter:

    { query: {
    filtered: {
    query: { text: { _all: "foo bar"}}
    filter: { term: { status: "open" }}
    }
    }}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(davrob) #4

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching onhttp://www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:http://www.elasticsearch.org/guide/reference/api/search/request-body....

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(Shay Banon) #5

You can have a term filter on listIds with a value of 325.

On Wed, Aug 3, 2011 at 2:09 PM, davrob2 daviroberts@gmail.com wrote:

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching onhttp://
www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body....

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(Jason-2) #6

Is there a performance difference between using a constant scored
query and a filtered query? All of our current queries are generated
as filtered queries and I'm concerned that this might be slower than
the constant score method. Examples:

  • Constant Score:
    { query: {
    constant_score: {
    filter: { term: { status: "open" }}
    }
    }}

  • Filtered Query
    { query: {
    filtered: {
    query: { match_all: {} } ,
    filter: { term: { status: "open" }}
    }
    }}

On Aug 3, 7:30 am, Shay Banon kim...@gmail.com wrote:

You can have a term filter on listIds with a value of 325.

On Wed, Aug 3, 2011 at 2:09 PM, davrob2 davirobe...@gmail.com wrote:

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching onhttp://
www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body....

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(davrob) #7

thanks Shay,

So what Filter should I used if I want a normal tokenized match? Or
are we then in realm of having to create a query first and then filter
it?

On Aug 3, 12:30 pm, Shay Banon kim...@gmail.com wrote:

You can have a term filter on listIds with a value of 325.

On Wed, Aug 3, 2011 at 2:09 PM, davrob2 davirobe...@gmail.com wrote:

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching onhttp://
www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body....

i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no scoring
    has to happen - just the filter gets applied
  3. In the third example, filter reduces the number of docs that
    can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(Shay Banon) #8

A filtered query with a match_all query is automatically converted
internally to a constant_score one, so guess which one is better? :slight_smile:

On Wed, Aug 3, 2011 at 3:06 PM, Jason jason@element84.com wrote:

Is there a performance difference between using a constant scored
query and a filtered query? All of our current queries are generated
as filtered queries and I'm concerned that this might be slower than
the constant score method. Examples:

  • Constant Score:
    { query: {
    constant_score: {
    filter: { term: { status: "open" }}
    }
    }}

  • Filtered Query
    { query: {
    filtered: {
    query: { match_all: {} } ,
    filter: { term: { status: "open" }}
    }
    }}

On Aug 3, 7:30 am, Shay Banon kim...@gmail.com wrote:

You can have a term filter on listIds with a value of 325.

On Wed, Aug 3, 2011 at 2:09 PM, davrob2 davirobe...@gmail.com wrote:

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter
type,

some are cached by default, some are not (see Filter Caching
onhttp://

www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body..
..

i) If I do a sort should I always be using filters because the
sort

removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters
(listId="52",

firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a
search

on all fields, as above, that is then filtered by fields or
should I

also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no
    scoring

has to happen - just the filter gets applied
3) In the third example, filter reduces the number of docs that
can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(Shay Banon) #9

Yea, if you really want to use a filter from a query, then you can wrap the
filter with a
http://www.elasticsearch.org/guide/reference/query-dsl/query-filter.html,
or, just provide the query to a constant score query.

On Wed, Aug 3, 2011 at 3:21 PM, davrob2 daviroberts@gmail.com wrote:

thanks Shay,

So what Filter should I used if I want a normal tokenized match? Or
are we then in realm of having to create a query first and then filter
it?

On Aug 3, 12:30 pm, Shay Banon kim...@gmail.com wrote:

You can have a term filter on listIds with a value of 325.

On Wed, Aug 3, 2011 at 2:09 PM, davrob2 davirobe...@gmail.com wrote:

Is it possible to use a filter to search for an exact value on a field
as opposed to a simple token match? I used this alot to make joins to
other entities' ids that are not in the index.

e.g. Many contacts are in many lists, means that for contacts I migh
have data like this

Contact 1: {"listIds" : 2,3,4,100,125,325}
Contact 2: { "listIds" : 200,325}

to get all the contacts in list 325, In Lucene syntax I would makle a
query like this: (listIds:"325")

On Aug 3, 10:28 am, davrob2 davirobe...@gmail.com wrote:

thanks clint, that's a brilliant explanation.

On Aug 2, 6:36 pm, Clinton Gormley cl...@traveljury.com wrote:

Hi David

I keep hearing that I should be using filters rather than queries
because they are much quicker

and because their results can be cached. Depending on the filter
type,

some are cached by default, some are not (see Filter Caching
onhttp://

www.elasticsearch.org/guide/reference/query-dsl/)

but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.

You can't use filters on a query string search. You have to use a
request body search:
http://www.elasticsearch.org/guide/reference/api/search/request-body..
..

i) If I do a sort should I always be using filters because the
sort

removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.

If you are not sorting on _score, then yes, rather use filters.

ii) Do filters have to be on a basic query to start with?

e.g. basic query ( search term=_all:smith) + filters
(listId="52",

firstName=smith, lastName=smith, companyName=smith)

If ii) is true, what should be my basic Query - should it be a
search

on all fields, as above, that is then filtered by fields or
should I

also specify the fields in the query?

Here are 3 variations:

  • Query only:

{ query: { text: { _all: "foo bar" }}}

  • Filter only:

{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}

  • Query and Filter:

{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}

So:

  1. You always need wrap your query in a top-level query element
  2. A "constant_score" query says "all docs are equal", so no
    scoring

has to happen - just the filter gets applied
3) In the third example, filter reduces the number of docs that
can be matched (and scored) by the query

There is also a top-level filter argument:

{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}

}

For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.

This is intended only to be used when you want to:

  • run a query
  • filter the results
  • BUT show facets on the UNFILTERED results

So this filter will be less efficient than the "filtered" query.

clint


(system) #10