Hi,
I am working with elasticsearch and faceted search. It worked great in
my first days, but after using it multiple times and testing various
cases, I am observing that not always the count value returned by the
es server matches with the expected value and I would like to know
whether its my fault because I am not using it in the proper way.
Consider the following example:
I define an analyzer based on semicolon to extract each of the terms
for faceting:
curl -XPOST http://localhost:9200/test/ -d '{
{"settings" : {"analysis" : {"analyzer" : {"semicolon" : {"type" :
"pattern", "pattern": ";"}}}},
"mappings" : {"news" : {"properties" : {"tags_an" : {"type" :
"string", "analyzer": "semicolon"}}}}}
}'
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx"
}
},
"facets": {
"tags": {
"terms": {
"field" : "tags_an"
}
}
}
}'
All the facets returned by the es server are presented to the user to
help her in the following search to narrow the results.
Imagine the results returned by the server to the previous query:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total":20
...
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "innovation",
"count" : 10
}, {
"term" : "open governement",
"count" : 4
} {
"term" : "science",
"count" : 2
},]
}
}
}
And the user is interested in the "open government" facet. So she
clicks in it and a new request is generated to the es server:
curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "tags"],
"query": "xxx AND tags:open government"
}
},
"facets": {
"tags": {
"terms": {
"field" : "tags_an"
}
}
}
}'
But now, surprisingly, the number of hits returned is not 4, as
expected, but 6!!
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total":6
...
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 15,
"terms" : [
{
"term" : "open governement",
"count" : 5
} {
"term" : "science",
"count" : 2
},]
}
}
}
In many cases, the returned result matches with the expected value,
but when the new requested value contains spaces or special characters
the result is not always correct. Am I making an error in the query
string? should I escape whitespaces? I have used faceted search in
other projects but I havent appreciated this behaviour anywhere.
Please, any help will be appreciated!
Thanks in advance!