I'm a noob using ES 1.5.2 I want to ngram analyze a field on index, but do no analysis on search. Why? I want a user to be able to search for "group" and match the field "aged grouper" (no wildcards required - but still supported). However, if the user enters "aged grouper" I only want to match documents where my search field contains (at least) that entire phase.
I created an ngram analyzer that I map to the field for index, and a "dummy analyzer" (to keep the whole phrase together) that I map to the field for search. I can test both analyzers using the analyze api, and see that they are getting tokenized correctly.
Everything seems correct. However, when I do my query_string search, the search text still gets tokenized into words. So, searching for "group" DOES find "aged grouper" but searching for "the group" finds all documents that have EITHER "the" OR "group" in them. I want the whole phrase to be used in the search.
I'm confused that when I use the analyze api and the validate api I seem to get two different answers (I think):
If I use the analyze api: _analyze/analyzer=dummy_analyzer&text=Hello there
..
<token>Hello there</token>
<== looks correct
..
However, If I use the validate api:
_validate/query?pretty=true&explain=true&analyzer=dummy_analyzer
{ "query" : {
"query_string" : {
"query" : "Hello there",
"default_field" : "tfield",
"analyzer" : "dummy_analyzer"
}
}
}
results in:
<explanation>props.tfield:Hello props.tfield:there</explanation>
<== looks INCORRECT (breaking phrase apart)
My config is below. My questions:
- Can someone explain the differences between the api results?
- Why isn't the search using the dummy_analyzer (would you expect this approach to work)?
- Is there a better way to have a field not analyzed on search only rather than using my kludged dummy_analyzer)
Thanks very much for any insight! -J
"analysis":{
"analyzer":{
"ngram_analyzer":{
"type":"custom",
"tokenizer":"ngram_tokenizer"
},
"dummy_analyzer":{
"type":"pattern",
"pattern":"00xyzzy00" <-- a dummy string trying to never separate words
}
},
"tokenizer":{
"ngram_tokenizer": {
"type":"nGram",
"min_gram":"4",
"max_gram":"500"
}
}
}
"mapping":{
....
"tfield":{
"index_analyzer":"ngram_analyzer",
"search_analyzer":"dummy:analyzer",
"type": "string",
"index","analyzed"
}
....