Phrase suggester for not_analyzed?


(nainy) #1

Is it possible to use a phrase suggester on fields that are indexed as not_analyzed? For some reason, I keep getting empty results.

Query looks something like:

POST http://localhost:9200/companies/_suggest
{
  "text" : "angel",
  "names" : {
    "phrase" : {
      "field" : "filter.name",
      "direct_generator": [
        {
          "field" : "filter.name",
          "suggest_mode" : "popular",
          "prefix_length": 2,
          "min_word_length": 3
        }
        ]
    }
  }
}

(Nik Everett) #2

Probably not. I haven't tried but I know that code reasonably well and it wants to use the analyzer for things. What are you trying to do?


(nainy) #3

Well, I'd like to utilize some sort of error correction functionality and, optionally, an autocomplete. I have the same data indexed with nGrams and morphology plugins, but when I use the suggester on those fields, it basically results in some awkward tokens being returned instead.

What would you recommend?


(Nik Everett) #4

Will the term suggester work for you?


(nainy) #5

I don't think it works well on fields analyzed with nGrams. I tried it on not_analyzed fields and pretty much got the same, empty result.


(Nik Everett) #6

Sorry! I wouldn't point the suggester at a field that uses the ngram tokenizer either....

What about using the keyword analyzer instead of not_analyzed?


(nainy) #7

I have to look into it, thank you. The reason I'm having duplicate, non-analyzed data is to be able to fetch documents with term filters. Would a keyword analyzer affect that?


(Nik Everett) #8

It should look the same as not_analyzed. Its worth testing that it does and that it helps on a smaller dataset first though.


(nainy) #9

I've tried indexing a few documents with a keyword analyzer. I've started getting some results, which are still very far from what I'm trying to achieve.

For example, I have a document which has a name 'Music'. If I want a suggestion for 'Musi', ES returns 'Music'. But if I type it with a lowercase, 'musi', it returns nothing, which is definitely not what I want.

I've tried using suggest-time analyzers with lowercasing, but those don't seem to help. Anything else I can do?


(Nik Everett) #10

Sure! Lots of stuff! It depends on which way you want the suggestions to work:

  1. Use a custom analyzer that has the keyword tokenizer and a lowercase filter. That should make 'musi' return 'music' but should also make 'Musi' return 'music'. Because all the tokens are now lowercased.

  2. You may be able write an analyzer that uppercases the first letter and use it with the analyzer option. I have lots less experience with this and don't remember how that code works.

  3. There looks like there is a lowercase_terms option too which might do something for you as well.

It makes you life much easier if you are ok with 'Musi' suggesting 'music'.


(system) #11