Hi David
I keep hearing that I should be using filters rather than queries
because they are much quicker
and because their results can be cached. Depending on the filter type,
some are cached by default, some are not (see Filter Caching on
Elastic — The Search AI Company | Elastic )
but I'm not sure how I should be using
them. Up to now I have been using QueryStringQueryBuilder to
generate Lucene syntax string queries.
You can't use filters on a query string search. You have to use a
request body search:
i) If I do a sort should I always be using filters because the sort
removes the relevance ranking and so makes the only advantage of
queries (i.e. relevance score) useless.
If you are not sorting on _score, then yes, rather use filters.
ii) Do filters have to be on a basic query to start with?
e.g. basic query ( search term=_all:smith) + filters (listId="52",
firstName=smith, lastName=smith, companyName=smith)
If ii) is true, what should be my basic Query - should it be a search
on all fields, as above, that is then filtered by fields or should I
also specify the fields in the query?
Here are 3 variations:
-
Query only:
{ query: { text: { _all: "foo bar" }}}
-
Filter only:
{ query: {
constant_score: {
filter: { term: { status: "open" }}
}
}}
-
Query and Filter:
{ query: {
filtered: {
query: { text: { _all: "foo bar"}}
filter: { term: { status: "open" }}
}
}}
So:
- You always need wrap your query in a top-level query element
- A "constant_score" query says "all docs are equal", so no scoring
has to happen - just the filter gets applied
- In the third example, filter reduces the number of docs that
can be matched (and scored) by the query
There is also a top-level filter argument:
{
query: { text: { _all: "foo bar" }},
filter: { term: { status: "open" }}
}
For normal usage, you should NOT use this version. It's purpose is
different from the "filtered" query mentioned above.
This is intended only to be used when you want to:
- run a query
- filter the results
- BUT show facets on the UNFILTERED results
So this filter will be less efficient than the "filtered" query.
clint