During indexing you are using standard tokenizer that splits words on "-".
So, 'Mercedes-Benz' is indexed like this:
$ curl -s
"localhost:9200/courses/_analyze?analyzer=autocomplete_analyzer&pretty=true"
-d "Mercedes-Benz" | grep "token"
"token" : "me",
"token" : "mer",
"token" : "merc",
"token" : "merce",
"token" : "merced",
"token" : "mercede",
"token" : "mercedes",
"token" : "be",
"token" : "ben",
"token" : "benz",
For the search part you are using keyword tokenizer, which doesn't tokenize
at all, so as a result the query for "mercedes-benz" is getting translated
into the query for the term "mercedes-benz":
$ curl -s -X GET
'http://localhost:9200/courses/course/_validate/query?pretty=true&explain=true'
-d '{
"query_string":{
"query":"mercedes-benz",
"fields":[
"name"
]
}
}' | grep "explanation"
"explanation" : "name:mercedes-benz"
There is no token "mercedes-benz" in the index, so you get no results. When
you replace "-" with space, the query_string parser splits the query into
to parts:
$ curl -s -X GET
'http://localhost:9200/courses/course/_validate/query?pretty=true&explain=true'
-d '{
"query_string":{
"query":"mercedes benz",
"fields":[
"name"
]
}
}' | grep "explanation"
"explanation" : "name:mercedes name:benz"
It searches for the term mercedes OR the term benz and you get expected
result. But because of this "OR" you are also finding "Being Cool..." when
you replace your query with "mercedes be".
To fix it, you should first replace query_string query with something that
wouldn't interfere with your tokenization. The Matchhttp://www.elasticsearch.org/guide/reference/query-dsl/match-query.htmlquery might be a good candidate. It still leaves the mismatch between
search tokenizer and index tokenizer to be addressed. There are few options
here. The simplest one is to replace the search analyzer with standard
analyzer with no stop words. That will work work in most case. The only
potential issue here is that it will disregard word order in you search. So
it will also find you "Being Cool" when you search for "Cool Being". Not
sure if this is something that you want to avoid or not.
On Tuesday, January 15, 2013 3:09:58 AM UTC-5, Jonathan Evans wrote:
I am trying to configure elasticsearch for autocomplete and have been
quite successful in doing so, however there are a couple of behaviours I
would like to tweak if possible.
-
When searching for 'Mercedes-Benz' no results are returned with the
current setup even though one of the indexed items contains the term.
'mercedes benz' 'merc' and 'benz' all match the right item as expected.
-
When searching for 'Mercedes-Be' I get a superfluous result: "Being
Cool With Bond, James Bond". The term is obviously being broken into
'mercedes' and 'be', the latter matching the start of "Being" however I
would rather the second word act to further limit the results presented to
the user (as is probably expected).
The results, settings and mapping are listed in the following gist:
Demonstrates two unwanted results with current elasticsearch setup. · GitHub
Could anyone offer any guidance on how to fix these issues?
Cheers,
Jon
--