Hello,
I like ElasticSearch and it's features, especially faceted filter with
result counts, but there are some cases when facet filter does not provide
the desired results count.
With "desired result" I mean the count of movies that satisfy filter, after
you apply the filter.
I tried faceted navigation on a demo project called "movie library"
I have a "Movie" document, that can have several "genres", several
"production countries" one "release year" fields that are used in facet
filter.
And "title", "description", "id" that are not used in facet filter.
All documents look like
(quotes and commas were not copied from ElasticSearch Head plugin for some
reason) :
{
- _index: movies
- _type: movie
- _id: 4
- _version: 1
- _score: 1
- _source: {
- created_at: 2012-11-26T09:02:51Z
- description: Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Vivamus vitae risus vitae lorem iaculis placerat. Aliquam sit amet
felis. Etiam congue. Donec risus risus, pretium ac, tincidunt eu, tempor
eu, quam. Morbi blandit mollis magna. Suspendisse eu tortor. Donec vitae
felis nec ligula blandit rhoncus. Ut a pede ac neque mattis facilisis.
Nulla nunc ipsum, sodales vitae, hendrerit non, imperdiet ac, ante. Morbi
sit amet mi. Ut magna. Curabitur id est. Nulla velit. Sed consectetuer
sodales justo. Aliquam dictum gravida libero. Sed eu turpis. Nunc id lorem.
Aenean consequat tempor mi. Phasellus in neque. Nunc fermentum convallis
ligula. - id: 4
- title: Movie 1
- updated_at: 2012-11-26T09:02:51Z
- year: 1989
- genres: [
- {
- id: 24
- title: Genre 3
}
- {
- id: 27
- title: Genre 6
}
]
- {
- countries: [
- {
- id: 15
- title: China
}
- {
- id: 18
- title: UK
}
- {
- id: 13
- title: USA
}
]
}
- {
}
So, there are 3 taxonomies: "year", "genres", "countries".
Genres and countries can have none or several values.
I tried such facet filter cases:
CASE "A" (facet search only on one value per term: "genres.title"=>"Genre
0", "countries.title"=>"Australia")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. If add a filter on another term ("year")
specifying single-value "1997" for example, I get the right count of
results after applying the filter.
Facet search on a single-valued term is pretty simple.
CASE "B" (facet search on several values per term: "genres.title"=>["Genre
0", "Genre 1"], "countries.title"=>["Australia", "USA"] with
term "execution": "and")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. Using "drill-down" functionality we got
some very small amount of movies that satisfy were produced in both USA and
Australia and the same time belongs to both Genre 0 and Genre 1 ("USA"
AND "Australia") AND ("Genre 0" AND "Genre 1")
Every time you apply filter on the term that already is filtered it adds
another "AND" condition inside the term, so the movie should be produced by
several countries the same time.
With every new value added to term values with "AND" logic we get less and
less results, having less and less count number near each facet value in
the filter. All counts are fine and display the number of results you will
get after adding new value to a term filter.
*CASE "C" *- the facet count behavior that is needed, but doesn't work
right (facet search on several values per term: "genres.title"=>["Genre
0", "Genre 1"], "countries.title"=>["Australia", "USA"] with
term "execution": "bool" or with default "execution")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "bool",
"_cache": true,
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "bool",
"_cache": true,
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
In this case, the faceted search counts are not fine.
When you have term filter like ("USA" OR "Australia") AND ("Genre 0" OR
"Genre 1")
Faceted search counts do not provide the real counts of the result count
you will get after adding a new value to the term filter.
And this "CASE C" is the desired behavior that is the
most user-friendly from all above cases.
We can *drill-down the results by adding values to a filter that is not
applied yet *(for example specifying year value for a year filter will
shrink result),
but the same time we can expand the results values to a filter that is
already applied (for example specifying another genre value to a genre
filter that has already some genres specified will expand the results
because of the "OR" logic inside term values).
Again, the desired behavior for a count near term value is to provide the
amount of results you will get after adding a value to the term filter when
filter has no values applied yet like "Genre 0 (25)"
And for the case when some values are already applied to a term filter, the
result will provide the positive delta (the amount of results with that the
search will be expanded) like "Genre 1 (+3)"
This sites have the right ("USA" OR "Australia") AND ("Genre 0" OR "Genre
1") facet counts behavior:
http://www.zappos.com/tech-accessories~1#!/cases/
http://rozetka.com.ua/notebooks/c80004/filter/producer=asus;25800=20879/
(translated version of this site:
http://translate.google.com/translate?hl=uk&sl=ru&tl=en&u=http%3A%2F%2Frozetka.com.ua%2Fnotebooks%2Fc80004%2Ffilter%2Fproducer%3Dasus%3B25800%3D20879%2F
)
Some terms like Countries, Genres can have several values applied to a
Movie, and the same time allows to filter on multiple term values using the
"OR" logic
Some terms like Year can have only one value applied to a Movie, but the
same time allows to filter on multiple term values using "OR" logic
The same time "AND" logic is used for different terms as displayed above.
How to make the filter query or facet configuration that will allow to
display the right counts for ("USA" OR "Australia") AND ("Genre 0" OR
"Genre 1") having either amount of results that will be displayed after
the filter is applied or the positive delta (like +3 results) that will be
displayed in the case if term already had some values filtered?
Thank you,
Alex
--