Faceted search counts for multi-valued terms

Hello,

I like ElasticSearch and it's features, especially faceted filter with
result counts, but there are some cases when facet filter does not provide
the desired results count.
With "desired result" I mean the count of movies that satisfy filter, after
you apply the filter.

I tried faceted navigation on a demo project called "movie library"

I have a "Movie" document, that can have several "genres", several
"production countries" one "release year" fields that are used in facet
filter.
And "title", "description", "id" that are not used in facet filter.

All documents look like
(quotes and commas were not copied from ElasticSearch Head plugin for some
reason) :

{

  • _index: movies
  • _type: movie
  • _id: 4
  • _version: 1
  • _score: 1
  • _source: {
    • created_at: 2012-11-26T09:02:51Z
    • description: Lorem ipsum dolor sit amet, consectetuer adipiscing
      elit. Vivamus vitae risus vitae lorem iaculis placerat. Aliquam sit amet
      felis. Etiam congue. Donec risus risus, pretium ac, tincidunt eu, tempor
      eu, quam. Morbi blandit mollis magna. Suspendisse eu tortor. Donec vitae
      felis nec ligula blandit rhoncus. Ut a pede ac neque mattis facilisis.
      Nulla nunc ipsum, sodales vitae, hendrerit non, imperdiet ac, ante. Morbi
      sit amet mi. Ut magna. Curabitur id est. Nulla velit. Sed consectetuer
      sodales justo. Aliquam dictum gravida libero. Sed eu turpis. Nunc id lorem.
      Aenean consequat tempor mi. Phasellus in neque. Nunc fermentum convallis
      ligula.
    • id: 4
    • title: Movie 1
    • updated_at: 2012-11-26T09:02:51Z
    • year: 1989
    • genres: [
      • {
        • id: 24
        • title: Genre 3
          }
      • {
        • id: 27
        • title: Genre 6
          }
          ]
    • countries: [
      • {
        • id: 15
        • title: China
          }
      • {
        • id: 18
        • title: UK
          }
      • {
        • id: 13
        • title: USA
          }
          ]
          }

}

So, there are 3 taxonomies: "year", "genres", "countries".
Genres and countries can have none or several values.

I tried such facet filter cases:

CASE "A" (facet search only on one value per term: "genres.title"=>"Genre
0", "countries.title"=>"Australia")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. If add a filter on another term ("year")
specifying single-value "1997" for example, I get the right count of
results after applying the filter.
Facet search on a single-valued term is pretty simple.

CASE "B" (facet search on several values per term: "genres.title"=>["Genre
0", "Genre 1"], "countries.title"=>["Australia", "USA"] with
term "execution": "and"
)
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. Using "drill-down" functionality we got
some very small amount of movies that satisfy were produced in both USA and
Australia and the same time belongs to both Genre 0 and Genre 1 ("USA"
AND "Australia") AND ("Genre 0" AND "Genre 1")

Every time you apply filter on the term that already is filtered it adds
another "AND" condition inside the term, so the movie should be produced by
several countries the same time.
With every new value added to term values with "AND" logic we get less and
less results, having less and less count number near each facet value in
the filter. All counts are fine and display the number of results you will
get after adding new value to a term filter.

*CASE "C" *- the facet count behavior that is needed, but doesn't work
right
(facet search on several values per term: "genres.title"=>["Genre
0", "Genre 1"], "countries.title"=>["Australia", "USA"] with
term "execution": "bool"
or with default "execution")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "bool",
"_cache": true,
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "bool",
"_cache": true,
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
In this case, the faceted search counts are not fine.
When you have term filter like ("USA" OR "Australia") AND ("Genre 0" OR
"Genre 1")

Faceted search counts do not provide the real counts of the result count
you will get after adding a new value to the term filter.
And this "CASE C" is the desired behavior that is the
most user-friendly from all above cases.
We can *drill-down the results by adding values to a filter that is not
applied yet *(for example specifying year value for a year filter will
shrink result),
but the same time we can expand the results values to a filter that is
already applied
(for example specifying another genre value to a genre
filter that has already some genres specified will expand the results
because of the "OR" logic inside term values).
Again, the desired behavior for a count near term value is to provide the
amount of results you will get after adding a value to the term filter when
filter has no values applied yet like "Genre 0 (25)"
And for the case when some values are already applied to a term filter, the
result will provide the positive delta (the amount of results with that the
search will be expanded) like "Genre 1 (+3)"

This sites have the right ("USA" OR "Australia") AND ("Genre 0" OR "Genre
1")
facet counts behavior:
http://www.zappos.com/tech-accessories~1#!/cases/
http://rozetka.com.ua/notebooks/c80004/filter/producer=asus;25800=20879/
(translated version of this site:
http://translate.google.com/translate?hl=uk&sl=ru&tl=en&u=http%3A%2F%2Frozetka.com.ua%2Fnotebooks%2Fc80004%2Ffilter%2Fproducer%3Dasus%3B25800%3D20879%2F
)

Some terms like Countries, Genres can have several values applied to a
Movie, and the same time allows to filter on multiple term values using the
"OR" logic
Some terms like Year can have only one value applied to a Movie, but the
same time allows to filter on multiple term values using "OR" logic
The same time "AND" logic is used for different terms as displayed above.

How to make the filter query or facet configuration that will allow to
display the right counts for ("USA" OR "Australia") AND ("Genre 0" OR
"Genre 1")
having either amount of results that will be displayed after
the filter is applied or the positive delta (like +3 results) that will be
displayed in the case if term already had some values filtered?

Thank you,
Alex

--

Hi Alex,

You can apply filters in different scopes. In your example you defined
the filter inside the query (query scope), which effects the query
results and the facets with the default scope (query). If you define
your filter as top level filter (outside the query) then only the
query results will be affected by the filter and not the facets. You
can then define a facet_filter on each defined facet (E.g. the genre
filter is defined on all others filters, but not the genre filter
itself). I

think this will give you the required result. The facet and filter
interaction is described at:
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

Martijn

On 9 December 2012 18:34, Alex Zelid alex@zelid.com wrote:

Hello,

I like ElasticSearch and it's features, especially faceted filter with
result counts, but there are some cases when facet filter does not provide
the desired results count.
With "desired result" I mean the count of movies that satisfy filter, after
you apply the filter.

I tried faceted navigation on a demo project called "movie library"

I have a "Movie" document, that can have several "genres", several
"production countries" one "release year" fields that are used in facet
filter.
And "title", "description", "id" that are not used in facet filter.

All documents look like
(quotes and commas were not copied from ElasticSearch Head plugin for some
reason) :

{

_index: movies
_type: movie
_id: 4
_version: 1
_score: 1
_source: {

created_at: 2012-11-26T09:02:51Z
description: Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Vivamus vitae risus vitae lorem iaculis placerat. Aliquam sit amet felis.
Etiam congue. Donec risus risus, pretium ac, tincidunt eu, tempor eu, quam.
Morbi blandit mollis magna. Suspendisse eu tortor. Donec vitae felis nec
ligula blandit rhoncus. Ut a pede ac neque mattis facilisis. Nulla nunc
ipsum, sodales vitae, hendrerit non, imperdiet ac, ante. Morbi sit amet mi.
Ut magna. Curabitur id est. Nulla velit. Sed consectetuer sodales justo.
Aliquam dictum gravida libero. Sed eu turpis. Nunc id lorem. Aenean
consequat tempor mi. Phasellus in neque. Nunc fermentum convallis ligula.
id: 4
title: Movie 1
updated_at: 2012-11-26T09:02:51Z
year: 1989
genres: [

{

id: 24
title: Genre 3

}
{

id: 27
title: Genre 6

}

]
countries: [

{

id: 15
title: China

}
{

id: 18
title: UK

}
{

id: 13
title: USA

}

]

}

}

So, there are 3 taxonomies: "year", "genres", "countries".
Genres and countries can have none or several values.

I tried such facet filter cases:

CASE "A" (facet search only on one value per term: "genres.title"=>"Genre
0", "countries.title"=>"Australia")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. If add a filter on another term ("year")
specifying single-value "1997" for example, I get the right count of results
after applying the filter.
Facet search on a single-valued term is pretty simple.

CASE "B" (facet search on several values per term: "genres.title"=>["Genre
0", "Genre 1"], "countries.title"=>["Australia", "USA"] with term
"execution": "and")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "and",
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "and",
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
Faceted search counts are fine. Using "drill-down" functionality we got some
very small amount of movies that satisfy were produced in both USA and
Australia and the same time belongs to both Genre 0 and Genre 1 ("USA" AND
"Australia") AND ("Genre 0" AND "Genre 1")
Every time you apply filter on the term that already is filtered it adds
another "AND" condition inside the term, so the movie should be produced by
several countries the same time.
With every new value added to term values with "AND" logic we get less and
less results, having less and less count number near each facet value in the
filter. All counts are fine and display the number of results you will get
after adding new value to a term filter.

CASE "C" - the facet count behavior that is needed, but doesn't work right
(facet search on several values per term: "genres.title"=>["Genre 0", "Genre
1"], "countries.title"=>["Australia", "USA"] with term "execution": "bool"
or with default "execution")
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"terms": {
"execution": "bool",
"_cache": true,
"genres.title": [
"Genre 0",
"Genre 1"
]
}
},
{
"terms": {
"execution": "bool",
"_cache": true,
"countries.title": [
"Australia",
"USA"
]
}
}
]
}
}
},
"facets": {
"global-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-genres": {
"terms": {
"field": "genres.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
},
"global-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
},
"global": true
},
"scoped-countries": {
"terms": {
"field": "countries.title",
"size": 1000,
"all_terms": false,
"order": "term"
}
}
},
"size": 1000
}
In this case, the faceted search counts are not fine.
When you have term filter like ("USA" OR "Australia") AND ("Genre 0" OR
"Genre 1")
Faceted search counts do not provide the real counts of the result count you
will get after adding a new value to the term filter.
And this "CASE C" is the desired behavior that is the most user-friendly
from all above cases.
We can drill-down the results by adding values to a filter that is not
applied yet (for example specifying year value for a year filter will shrink
result),
but the same time we can expand the results values to a filter that is
already applied (for example specifying another genre value to a genre
filter that has already some genres specified will expand the results
because of the "OR" logic inside term values).
Again, the desired behavior for a count near term value is to provide the
amount of results you will get after adding a value to the term filter when
filter has no values applied yet like "Genre 0 (25)"
And for the case when some values are already applied to a term filter, the
result will provide the positive delta (the amount of results with that the
search will be expanded) like "Genre 1 (+3)"

This sites have the right ("USA" OR "Australia") AND ("Genre 0" OR "Genre
1") facet counts behavior:
http://www.zappos.com/tech-accessories~1#!/cases/
http://rozetka.com.ua/notebooks/c80004/filter/producer=asus;25800=20879/
(translated version of this site:
http://translate.google.com/translate?hl=uk&sl=ru&tl=en&u=http%3A%2F%2Frozetka.com.ua%2Fnotebooks%2Fc80004%2Ffilter%2Fproducer%3Dasus%3B25800%3D20879%2F)

Some terms like Countries, Genres can have several values applied to a
Movie, and the same time allows to filter on multiple term values using the
"OR" logic
Some terms like Year can have only one value applied to a Movie, but the
same time allows to filter on multiple term values using "OR" logic
The same time "AND" logic is used for different terms as displayed above.

How to make the filter query or facet configuration that will allow to
display the right counts for ("USA" OR "Australia") AND ("Genre 0" OR "Genre
1") having either amount of results that will be displayed after the filter
is applied or the positive delta (like +3 results) that will be displayed in
the case if term already had some values filtered?

Thank you,
Alex

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Hi Martijn,

I read the API docs before asking the question and I tried different scopes
and found that I really need a query scope (the filter should affect a
results and facets)

In my previous email I provided 3 different query examples.

All of them are query scope filters (affects facet counts and results) -
this is what I need I guess to have filter applied to both results and
facets.

And facet counts work fine, BUT only unless you start to use "OR" logic for
a filter inside one term values.
As soon as you filter for something like "give me all movies with country
in
("USA" OR "Australia") AND genre in ("Genre 0" OR "Genre 1")
instead of country in ("USA" AND "Australia") AND genre in ("Genre 0"
AND "Genre 1")
you start getting invalid facet counts, I don't know the
reason, looks like facet counts can't handle "OR" logic per several filter
values inside one term.

Case A - always display right results and right facet counts (filter is on
one term value per term E.g. country=**("USA") AND genre=("Genre 0") )
Case B - always display right results and right facet counts (filter is on
several term values per term using "AND" logic E.g. country=**("USA" AND
"Australia") AND genre=("Genre 0" AND "Genre 1")
)
Case B - always display right results and invalid facet counts (filter is
on several term values per term using "OR" logic E.g. country=**("USA" OR
"Australia") AND genre=("Genre 0" OR "Genre 1")
)

How should I define filters to have them applied to results and facets the
same time and have the right facet counts for "Case B"?

Thank you,
Alex

--