I see what you mean now. In a simple facet usage, then each time a facet is
clicked, the facet is added as a filter to the query (later on becoming a
boolean filter). But, in this case, the results that you get will always be
narrows down to the query with the filters, so getting filters on just the
query (stuff) and not the filter (color:red) is not possible.
So, in order to do that, you will need, for the ones the go outside of the
faceted filtering, execute another count search for the facets you want with
the original queries, which results in unnecessary calls.
This is a nice scenario, and can be solved quite easily actually by adding
to the facet query the ability to override which query it facets on (so some
facets will run on the "master" query, which is "stuff", and others will run
on the filtered query). This solution is heavily based on the fact that
filters are easily cached, so you have the docidsets in memory already.
I can have a look at bobo browse to see what you are doing, wouldn't mind
trying to get its facet support instead of reimplemting it myself. There are
some important ground features that I don't want to loose with facets, and
the most important one is to be able to define them dynamically (i.e. per
request there can be different facets) and not define them upfront.
Cheers,
Shay
On Tue, Feb 9, 2010 at 11:28 PM, Jake Mannix jake.mannix@gmail.com wrote:
On Tue, Feb 9, 2010 at 12:47 PM, Shay Banon shay.banon@elasticsearch.comwrote:
Filters keyed on indexreader, ok, fairly straightforward (although if you
want to do multi-select, this will get tricky: if the user selects
"color:red" AND "month:Jan", then you want to filter by both of them for the
search results, but also collect the number of hits on the other colors (as
long as month:Jan matches), and the number of hits on the other months (as
long as the color:red matches), etc...).
Not sure I understand, you can wrap a query with a filter, and then use
that. You will get the count (restricted to the query you ran) of "color:red
AND month:Jan". Unless you mean that you want to get counts for color:red
and also counts for month:Jan, in this case you simply have two facet
queries.
Here's what I mean: if you are displaying facet information for both color
and month, you can let people select from both, so that the results returned
are filtered, as you say, by "color:red AND month:Jan", that is great. But
let's look at what pieces of info the user should have: At first, they have
added no facet filters to query "stuff", and we return all matches for
"stuff", the total count("stuff") as well as some facet data:
{color :
{red : count("stuff AND color:red") },
{blue : count("stuff AND color:blue") },
{green : count("stuff AND color:green") }
},
{month:
{jan : count("stuff AND month:jan") },
{feb : count("stuff AND month:feb") },
{mar : count("stuff AND month:mar") }
}
Now they click on color:red, and we return all the matches for "stuff AND
color:red", along with count("stuff AND color:red"), and facet data:
{color :
{red : count("stuff AND color:red") /* this link won't be clickable
because we're here already /},
{blue : count("stuff AND color:blue") / this link _is_clickable, and can
applies the filter "color:blue OR color:red" /},
{green : count("stuff AND color:green") / as with color:blue above */}
},
{month :
{jan : count("stuff AND color:red AND month:jan") },
{feb : count("stuff AND color:red AND month:feb") },
{mar : count("stuff AND color:red AND month:mar") }
}
The counts for color without red being applied should be returned because
we may want to allow users to be able to select a couple of facet values
OR'ed together (within a field - filters across fields are AND'ed, as
usual).
Now comes the tricky part, the users clicks on "month:jan", and we return
results filtered by "stuff AND color:red AND month:jan", along with
count("stuff AND color:red AND month:jan"), and facet data:
{color :
{red : count("stuff AND color:red AND month:jan") /* this link won't be
clickable because we're here already /},
{blue : count("stuff AND color:blue AND month:jan") / _is_clickable, and
switches the filter to "month:jan AND (color:blue OR color:red)" /},
{green : count("stuff AND color:green") / as with color:blue above /}
},
{month :
{jan : count("stuff AND color:red AND month:jan") / no longer clickable,
we're here already / },
{feb : count("stuff AND color:red AND month:feb") / is clickable, and
switches the filter to "(month:jan OR month:feb) AND color:red" / },
{mar : count("stuff AND color:red AND month:mar") / similar to month:feb
above */ }
}
This is what the user expects from faceted search, in the ui, but I'm
pretty sure that the way Solr computes this, is as you say - by executing
multiple facet queries, but that is horribly inefficient (esp as the number
of fields to facet on grows) - it's much nicer if you can return all of
these counts in one request, it just requires some work to do it
efficiently (this is what we do in bobo-browse).
-jake