Filter, facets and filtered query

Hi

I'm refactoring the elastica client at the moment. So I realized that
I'm not completely sure how filters, facets and filtered queries can
be combined. Lets take the last example in the link below:

http://www.elasticsearch.org/guide/reference/api/search/filter.html

We have a query, a filter and a facet. Now I have several questions:

  • Would it be possible to use a filtered query instead of a "standard"
    query?
  • Could I use in addition a facet_filter?
  • When should I use a filtered query or a query in combination with a
    filter?

Heya, inlined:
On Tuesday, March 29, 2011 at 11:14 PM, ruflin wrote:

Hi

I'm refactoring the elastica client at the moment. So I realized that
I'm not completely sure how filters, facets and filtered queries can
be combined. Lets take the last example in the link below:

Elasticsearch Platform — Find real-time answers at scale | Elastic

We have a query, a filter and a facet. Now I have several questions:

  • Would it be possible to use a filtered query instead of a "standard"
    query?
    Yea, filtered query is just another query type, like term query.
  • Could I use in addition a facet_filter?
    Sure, you can place a facet_filter in each facet, accepting any type of the different filters provided.
  • When should I use a filtered query or a query in combination with a
    filter?
    Depends what you are after. If you don't use facets, then a filtered query might be better perf wise compared to a query and a search filter (though difference will usually be very small). The filter element is there to allow for simplified facet based navigation.

Thanks for your answer. Now it makes much more sense. I did not
realize that filtered is just another query type.

On Mar 30, 10:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya, inlined:On Tuesday, March 29, 2011 at 11:14 PM, ruflin wrote:

Hi

I'm refactoring the elastica client at the moment. So I realized that
I'm not completely sure how filters, facets and filtered queries can
be combined. Lets take the last example in the link below:

Elasticsearch Platform — Find real-time answers at scale | Elastic

We have a query, a filter and a facet. Now I have several questions:

  • Would it be possible to use a filtered query instead of a "standard"
    query?

Yea, filtered query is just another query type, like term query.> - Could I use in addition a facet_filter?

Sure, you can place a facet_filter in each facet, accepting any type of the different filters provided.

  • When should I use a filtered query or a query in combination with a
    filter?
    Depends what you are after. If you don't use facets, then a filtered query might be better perf wise compared to a query and a search filter (though difference will usually be very small). The filter element is there to allow for simplified facet based navigation.

I'm also confused about this filter/facet combination. I can see why you would want to add specific filters to facets, but I don't understand why the global filter is not used to compute facets. For example, if you do :
{ "query": ...,
"filter": ...,
"facets": ... }

Then the facet counts are returned without taking into account the filter. This seems illogic to me, as the documentation states that the facets are by default bound to the current query. Either the documentation should be more precise, telling that the filter is ignored, or there should be an option to make the filter active in facets too (which I think would be better for both backward compatibility and usability).

Hi Cedric

On Thu, 2011-07-07 at 00:41 -0700, melix wrote:

I'm also confused about this filter/facet combination. I can see why you
would want to add specific filters to facets, but I don't understand why the
global filter is not used to compute facets. For example, if you do :
{ "query": ...,
"filter": ...,
"facets": ... }

Then the facet counts are returned without taking into account the filter.
This seems illogic to me, as the documentation states that the facets are by
default bound to the current query. Either the documentation should be more
precise, telling that the filter is ignored, or there should be an option to
make the filter active in facets too (which I think would be better for both
backward compatibility and usability).

"the facets are by default bound to the current query"

This means the query that is specified in the "query" parameter.

Normally, if you want a filtered query, you do:

{
"query": {
"filtered": {
"query": { find foo },
"filter": { filter by bar }
},
"facets": { ... } # foo filtered by bar
}

Facets will be bound by the filter specified in that query.

The TOP LEVEL filter parameter was added so that you can get your facet
results on all documents matching the query { find foo }, then later
filter your query results with { filter by bar }, which would look like:

{
"query": { find foo },
"filter": { filter by bar },
"facets": { ... } # foo, not filtered
}

clint

I understand, but in a facetted search, I would expect everything to be
refined, so the facet counts should IMHO change whenever I add a filter.
I still think an option to make have the global filter used by the
facets would be nice :slight_smile:

Facets will be bound by the filter specified in that query.

The TOP LEVEL filter parameter was added so that you can get your facet
results on all documents matching the query { find foo }, then later
filter your query results with { filter by bar }, which would look like:

{
"query": { find foo },
"filter": { filter by bar },
"facets": { ... } # foo, not filtered
}

On Thu, 2011-07-07 at 11:29 +0200, Cédric Champeau wrote:

I understand, but in a facetted search, I would expect everything to be
refined, so the facet counts should IMHO change whenever I add a filter.
I still think an option to make have the global filter used by the
facets would be nice :slight_smile:

Cedric, I think you missed my point. The top-level filter is there ONLY
for situations where you specifically DON'T want the facets to be bound
by your filter.

So if you use the filtered query, then it will do exactly what you want.

clint

Sure, but I think you also miss some background about how my queries are
generated. In fact, I have a legacy application which has its own query
format. I am able to map this query format to various search engines,
including Lucene, MG4J and now I am working on Elasticsearch. This
abstract query format is quite old and while I am able to map it to
various search engines, it's logic is sometimes slightly different from
the underlying search engine. This abstract model does have a notion of
filter on a query, and I could use it, but there is another problem :

The second point is about the query format. My abstract query model is
an object model. The input format is parsed into this abstract model. I
had XML until now, and I'm adding a JSON format now. I posted a message
yesterday regarding the ability to use the internal JSON parsers from ES
to be able to reuse the Elasticsearch syntax at some points.
Unfortunately, this is not possible unless using the "extraSource"
parameter of a query. Having those two points in mind, I have a choice
to make :

 - add the ability to use the ElasticSearch JSON syntax of a query 

filter inside my own JSON query format, but it would require me to write
parsers for the various filters ES supports
- add an "extraSource" section into my custom JSON format where the
user may add a filter. This is the solution I managed to implement.
While it has the restriction of not being able to add a filter on every
subquery, it is perfectly acceptable. However, the problem with this
solution is that the user must use a boolean AND query in order to
emulate a filter at the query level to have the filters "used" by the
facets. Indeed, this is not a filter and suffers performance penalties.

Having a flag allowing the "global filter" to be used by facets would
solve this issue for me. The alternative, reimplementing json parsers
where I wanted the very same format as Elasticsearch to be supported,
looks unproductive.

I hope this makes it clearer why I think this flag would be useful :smiley:

Le 07/07/2011 11:38, Clinton Gormley a écrit :

On Thu, 2011-07-07 at 11:29 +0200, Cédric Champeau wrote:

I understand, but in a facetted search, I would expect everything to be
refined, so the facet counts should IMHO change whenever I add a filter.
I still think an option to make have the global filter used by the
facets would be nice :slight_smile:
Cedric, I think you missed my point. The top-level filter is there ONLY
for situations where you specifically DON'T want the facets to be bound
by your filter.

So if you use the filtered query, then it will do exactly what you want.

clint

Sir!
I am new in Elasticsearch I know little bit anout it but I don't know
how to configure it to my system anybody can help me.
stepwise please write because after reading doc I couldn't understand
please help me ASAP.

                                                    thanks.

On Thu, Jul 7, 2011 at 2:47 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Hi Cedric

On Thu, 2011-07-07 at 00:41 -0700, melix wrote:

I'm also confused about this filter/facet combination. I can see why you
would want to add specific filters to facets, but I don't understand why
the
global filter is not used to compute facets. For example, if you do :
{ "query": ...,
"filter": ...,
"facets": ... }

Then the facet counts are returned without taking into account the
filter.
This seems illogic to me, as the documentation states that the facets are
by
default bound to the current query. Either the documentation should be
more
precise, telling that the filter is ignored, or there should be an option
to
make the filter active in facets too (which I think would be better for
both
backward compatibility and usability).

"the facets are by default bound to the current query"

This means the query that is specified in the "query" parameter.

Normally, if you want a filtered query, you do:

{
"query": {
"filtered": {
"query": { find foo },
"filter": { filter by bar }
},
"facets": { ... } # foo filtered by bar
}

Facets will be bound by the filter specified in that query.

The TOP LEVEL filter parameter was added so that you can get your facet
results on all documents matching the query { find foo }, then later
filter your query results with { filter by bar }, which would look like:

{
"query": { find foo },
"filter": { filter by bar },
"facets": { ... } # foo, not filtered
}

clint

Heya,

The reasoning behind the "global" filter not affecting facets is facet navigation. When you start to filter your search results by facets ("navigate them"), some facets should still compute based on the original user entered query, and not the filtering done based on the facet selection.

-shay.banon

On Thursday, July 7, 2011 at 12:58 PM, Cédric Champeau wrote:

Sure, but I think you also miss some background about how my queries are
generated. In fact, I have a legacy application which has its own query
format. I am able to map this query format to various search engines,
including Lucene, MG4J and now I am working on Elasticsearch. This
abstract query format is quite old and while I am able to map it to
various search engines, it's logic is sometimes slightly different from
the underlying search engine. This abstract model does have a notion of
filter on a query, and I could use it, but there is another problem :

The second point is about the query format. My abstract query model is
an object model. The input format is parsed into this abstract model. I
had XML until now, and I'm adding a JSON format now. I posted a message
yesterday regarding the ability to use the internal JSON parsers from ES
to be able to reuse the Elasticsearch syntax at some points.
Unfortunately, this is not possible unless using the "extraSource"
parameter of a query. Having those two points in mind, I have a choice
to make :

  • add the ability to use the Elasticsearch JSON syntax of a query
    filter inside my own JSON query format, but it would require me to write
    parsers for the various filters ES supports
  • add an "extraSource" section into my custom JSON format where the
    user may add a filter. This is the solution I managed to implement.
    While it has the restriction of not being able to add a filter on every
    subquery, it is perfectly acceptable. However, the problem with this
    solution is that the user must use a boolean AND query in order to
    emulate a filter at the query level to have the filters "used" by the
    facets. Indeed, this is not a filter and suffers performance penalties.

Having a flag allowing the "global filter" to be used by facets would
solve this issue for me. The alternative, reimplementing json parsers
where I wanted the very same format as Elasticsearch to be supported,
looks unproductive.

I hope this makes it clearer why I think this flag would be useful :smiley:

Le 07/07/2011 11:38, Clinton Gormley a écrit :

On Thu, 2011-07-07 at 11:29 +0200, Cédric Champeau wrote:

I understand, but in a facetted search, I would expect everything to be
refined, so the facet counts should IMHO change whenever I add a filter.
I still think an option to make have the global filter used by the
facets would be nice :slight_smile:
Cedric, I think you missed my point. The top-level filter is there ONLY
for situations where you specifically DON'T want the facets to be bound
by your filter.

So if you use the filtered query, then it will do exactly what you want.

clint

Here's another example of using a filtered query with a facet:

When I first started using facet queries they were bringing down the cluster after looking around in the documentation I discovered that the below query I was trying actually put all messages from the index with my filter field into the filter cache rather than just the messages I'm interested in.

{
"query": {
"query_string": {
"query": "short_message:(650 OR 500) AND @timestamp:[2013-11-04T21:00:00 TO 2013-11-04T21:00:01]"
}
},
"size": 1,
"facets": {
"facet": {
"terms": {
"field": "full_message_na",
"size": 500
}
}
}
}

This query gives the same results but only does the facet on the results of the filtered query. And did not bring down the cluster.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"range": {
"@timestamp": {
"from": "2013-11-04T21:00:00",
"to": "2013-11-04T21:00:05"
}
}
},
{
"terms": {
"short_message": [
"500",
"650"
]
}
}
]
}
}
},
"facets": {
"my_facet": {
"terms": {
"field": "full_message_na",
"size": 10
}
}
},
"size": 1
}

The downside to the above query is that it does not use the query string format for the filter so in order to use it i will have to change the input people are giving my script.

For now I'm still using the old query and have just limited the facet cache to 10% of heap. Therefore it never fills up.