Well, I'd like to utilize some sort of error correction functionality and, optionally, an autocomplete. I have the same data indexed with nGrams and morphology plugins, but when I use the suggester on those fields, it basically results in some awkward tokens being returned instead.
I have to look into it, thank you. The reason I'm having duplicate, non-analyzed data is to be able to fetch documents with term filters. Would a keyword analyzer affect that?
I've tried indexing a few documents with a keyword analyzer. I've started getting some results, which are still very far from what I'm trying to achieve.
For example, I have a document which has a name 'Music'. If I want a suggestion for 'Musi', ES returns 'Music'. But if I type it with a lowercase, 'musi', it returns nothing, which is definitely not what I want.
I've tried using suggest-time analyzers with lowercasing, but those don't seem to help. Anything else I can do?
Sure! Lots of stuff! It depends on which way you want the suggestions to work:
Use a custom analyzer that has the keyword tokenizer and a lowercase filter. That should make 'musi' return 'music' but should also make 'Musi' return 'music'. Because all the tokens are now lowercased.
You may be able write an analyzer that uppercases the first letter and use it with the analyzer option. I have lots less experience with this and don't remember how that code works.
There looks like there is a lowercase_terms option too which might do something for you as well.
It makes you life much easier if you are ok with 'Musi' suggesting 'music'.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.