Analyzer issues using query_string in ES 1.5.2?

dunnlow · September 9, 2015, 3:39pm

I'm a noob using ES 1.5.2 I want to ngram analyze a field on index, but do no analysis on search. Why? I want a user to be able to search for "group" and match the field "aged grouper" (no wildcards required - but still supported). However, if the user enters "aged grouper" I only want to match documents where my search field contains (at least) that entire phase.

I created an ngram analyzer that I map to the field for index, and a "dummy analyzer" (to keep the whole phrase together) that I map to the field for search. I can test both analyzers using the analyze api, and see that they are getting tokenized correctly.

Everything seems correct. However, when I do my query_string search, the search text still gets tokenized into words. So, searching for "group" DOES find "aged grouper" but searching for "the group" finds all documents that have EITHER "the" OR "group" in them. I want the whole phrase to be used in the search.

I'm confused that when I use the analyze api and the validate api I seem to get two different answers (I think):

If I use the analyze api: _analyze/analyzer=dummy_analyzer&text=Hello there
..
<token>Hello there</token> <== looks correct
..

However, If I use the validate api:
_validate/query?pretty=true&explain=true&analyzer=dummy_analyzer

{ "query" : {
     "query_string" : {
         "query" : "Hello there",
         "default_field" : "tfield",
         "analyzer" : "dummy_analyzer"
      }
   }
}

results in:
<explanation>props.tfield:Hello props.tfield:there</explanation> <== looks INCORRECT (breaking phrase apart)

My config is below. My questions:

Can someone explain the differences between the api results?
Why isn't the search using the dummy_analyzer (would you expect this approach to work)?
Is there a better way to have a field not analyzed on search only rather than using my kludged dummy_analyzer)

Thanks very much for any insight! -J

"analysis":{
  "analyzer":{
      "ngram_analyzer":{
          "type":"custom",
          "tokenizer":"ngram_tokenizer"
      },
      "dummy_analyzer":{
        "type":"pattern",
        "pattern":"00xyzzy00"  <-- a dummy string trying to never separate words
      }
   },
   "tokenizer":{
       "ngram_tokenizer": {
           "type":"nGram",
           "min_gram":"4",
           "max_gram":"500"
        }
    }
}

"mapping":{
 ....
   "tfield":{
       "index_analyzer":"ngram_analyzer",
       "search_analyzer":"dummy:analyzer",
       "type": "string",
       "index","analyzed"
    }
....

javanna · September 10, 2015, 4:29pm

I think these differences may just have to do with using the query_string. May I ask if you tried the match query instead? Or are there features that you need out of the query_string query?

dunnlow · September 10, 2015, 6:39pm

The syntax of the query_string is ideal for my users. I suppose I could give up on providing wild card. It was suggested that I abandon query_string and use match (working on that now)

My goal:

query "fort" should match "unfortunately" (as if "*fort*" was entered)
query "is unfortunate" should only match fields with (at least) that whole phrase
query "my fort??e" should match "my fortune" (in the best of all possible worlds)
queries should allow for simple AND, OR, NOT, and () grouping logic
there can be no fuzziness (only allow exact phase matches with constraints above)

I get the leading/trailing wildcard simulation by indexing with an ngram analyzer (I can test with the analyze api and verify it is working correctly).

Does this seem feasible with a match (minus the mid-term wildcard support)?

Topic		Replies	Views
Help with ngram analyzer after migrating to ES 1.5 Elasticsearch	1	338	July 6, 2017
Is there a NOOP analyzer? Elasticsearch	8	2890	July 5, 2017
Query does not work without specifying analyzers Elasticsearch	1	395	December 14, 2018
Query analzyer with respect to field/index analzyer Elasticsearch	5	346	July 6, 2017
Search_analyzer explanation Elasticsearch	1	252	July 6, 2017

Analyzer issues using query_string in ES 1.5.2?

Related topics