Facet counts dont match with real values


(Tania) #1

Hi,
I am working with elasticsearch and faceted search. It worked great in
my first days, but after using it multiple times and testing various
cases, I am observing that not always the count value returned by the
es server matches with the expected value and I would like to know
whether its my fault because I am not using it in the proper way.
Consider the following example:
I define an analyzer based on semicolon to extract each of the terms
for faceting:
curl -XPOST http://localhost:9200/test/ -d '{
{"settings" : {"analysis" : {"analyzer" : {"semicolon" : {"type" :
"pattern", "pattern": ";"}}}},
"mappings" : {"news" : {"properties" : {"tags_an" : {"type" :
"string", "analyzer": "semicolon"}}}}}
}'

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx"
}
},
"facets": {
"tags": {
"terms": {
"field" : "tags_an"
}
}
}
}'

All the facets returned by the es server are presented to the user to
help her in the following search to narrow the results.
Imagine the results returned by the server to the previous query:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total":20
...
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "innovation",
"count" : 10
}, {
"term" : "open governement",
"count" : 4
} {
"term" : "science",
"count" : 2
},]
}
}
}

And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx AND tags:open government"
}
},
"facets": {
"tags": {
"terms": {
"field" : "tags_an"
}
}
}
}'
But now, surprisingly, the number of hits returned is not 4, as
expected, but 6!!
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total":6
...
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 15,
"terms" : [
{
"term" : "open governement",
"count" : 5
} {
"term" : "science",
"count" : 2
},]
}
}
}

In many cases, the returned result matches with the expected value,
but when the new requested value contains spaces or special characters
the result is not always correct. Am I making an error in the query
string? should I escape whitespaces? I have used faceted search in
other projects but I havent appreciated this behaviour anywhere.
Please, any help will be appreciated!
Thanks in advance!


(Clinton Gormley) #2

Hi Tania

And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx AND tags:open government"

This is actually a query for :

"xxx" AND "tags:open" OR "government"

You could change it to:

 "query": "xxx AND tags:\"open goverment\""

Or alternatively:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"tags" : "open goverment"
}
},
"query" : {
"query_string" : {
"fields" : [ "title","description" ],
"query" : "xxx"
}
}
}
}
}
'

clint


(Shay Banon) #3

Adding on what clinton said, filteres are the much preferred way to go about
doing it.

On Tue, Aug 2, 2011 at 1:47 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Tania

And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx AND tags:open government"

This is actually a query for :

   "xxx" AND "tags:open" OR "government"

You could change it to:

"query": "xxx AND tags:\"open goverment\""

Or alternatively:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"tags" : "open goverment"
}
},
"query" : {
"query_string" : {
"fields" : [ "title","description" ],
"query" : "xxx"
}
}
}
}
}
'

clint


(Tania) #4

Thanks a lot! I knew it had to be my fault... elastic search never fails! :slight_smile:


(Tania) #5

Hi again!
Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?
Thanks!

On 2 ago, 13:11, Shay Banon kim...@gmail.com wrote:

Adding on what clinton said, filteres are the much preferred way to go about
doing it.

On Tue, Aug 2, 2011 at 1:47 PM, Clinton Gormley cl...@traveljury.comwrote:

Hi Tania

And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGEThttp://localhost:9200/test/_search?pretty=true-d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx AND tags:open government"

This is actually a query for :

   "xxx" AND "tags:open" OR "government"

You could change it to:

"query": "xxx AND tags:\"open goverment\""

Or alternatively:

curl -XGEThttp://localhost:9200/test/_search?pretty=true-d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"tags" : "open goverment"
}
},
"query" : {
"query_string" : {
"fields" : [ "title","description" ],
"query" : "xxx"
}
}
}
}
}
'

clint


(Ivan Brusic) #6

Not sure if I completely understand your question/scenario, but you might
be looking for facet filters:
http://www.elasticsearch.org/guide/reference/api/search/facets/filter-facet.html

Facet filters will "discard" the filtered matches from the facets.

--
Ivan

On Tue, Aug 2, 2011 at 12:39 PM, tania yosoythania@hotmail.com wrote:

Hi again!
Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?
Thanks!


(Clinton Gormley) #7

Hi Tania

Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?

I think it is likely that you are doing something wrong.

Please gist (http://gist.github.com/gists ) an example of what you are
doing, the results you are getting, and the results you would like to
get.

clint


(Shay Banon) #8

One option, if I understood the question, if hte fact that a filtered query
will cause facets to be computed on the filtered result set? If you don't
want this behavior, then use the top level filter element:
http://www.elasticsearch.org/guide/reference/api/search/filter.html.

On Tue, Aug 2, 2011 at 7:48 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Tania

Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?

I think it is likely that you are doing something wrong.

Please gist (http://gist.github.com/gists ) an example of what you are
doing, the results you are getting, and the results you would like to
get.

clint


(Tania) #9

First of all, yesterday I didn't understand very well what you
proposed me about using filters, and I was implementing
http://www.elasticsearch.org/guide/reference/api/search/filter.html,
while what interests me is filtered query
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query.html.

Now I have reimplemented it with filtered query and I think that while
creating the gist I have discovered what is happening, but still not
understand.

Here is a little example with all the responses: https://gist.github.com/1122262

(1) Because I want to obtain tag facets for each search and tags could
be longer than two words, I need to create an analyzer to calculate
facets for the whole tag based on a delimiter (in this case
semicolon).
So tags are indexed twice: one 'normal' and another with an analyzer.

So as to help the user to find the appropriate result, I present all
the facets obtained in the previous search.The user can choose
whatever facet and then a new request is generated to the es server to
recalculate the results, now taking into consideration the first query
and the selected facet. Imagine the user clicks on "West Asia". (steps
4, 5 and 6)

(4) in step 4 of the gist I show how I was doing this faceted search
before talking to you (what you do not recommend)

(5) in step 5 I try to do it with filtered query (notice that I use an
'and filter' because the user could continue clicking on more facets
and narrowing the search), but I apply it to the normal field (not
analyzed). Results are null!! why is this happening?

(6) I repeat the same search but filtering on the analyzed
field(tags_analyzed). Now the results are what I expected!

I think that I have found the solution that my app needs, but I dont
understand why this is happening.

I need to store tags twice, otherwise facets are not calculated
properly, ans if I definitely apply this solution I would need to
store tags fields three times (because my app is in Spanish and I have
to store them with tildes and without them!)

Is this all correct? What could be improved? Why elastic search
behaves so differently in analyzed fields and in not analyzed?

Thanks, thanks, thanks!

On 2 ago, 20:21, Shay Banon kim...@gmail.com wrote:

One option, if I understood the question, if hte fact that a filtered query
will cause facets to be computed on the filtered result set? If you don't
want this behavior, then use the top level filter element:http://www.elasticsearch.org/guide/reference/api/search/filter.html.

On Tue, Aug 2, 2011 at 7:48 PM, Clinton Gormley cl...@traveljury.comwrote:

Hi Tania

Considering your recommendations about using filters instead of
building large and error sensitive queries, I have tried using
filters. Everything went ok (fantastic documentation!) but oh dear! ,
I suspect that facets are calculated on all the matches, this is the
filtered ones are not discarded. Is this ok?
I need to extract facets for all my searches, so even if filtering is
the appropriate solution I think that in my case it doesn't fit. Or is
there another type that I could use to combine filters and facets (and
obtain facets only for the filtered results)?

I think it is likely that you are doing something wrong.

Please gist (http://gist.github.com/gists) an example of what you are
doing, the results you are getting, and the results you would like to
get.

clint


(system) #10