I have a scenario where in I am using n-gram analyser as Default Analyser and Standard analyser as search analyser in my settings for the field - "title".
The word "NOS" is very specific to our application and I do have few documents with the text starting with "NOS". When I do a search I am getting results of all the documents including the ones that has NOS as a substring. But my expectation is to get only documents which has only NOS text in it.
For example : When I search for "NOS" i get the results containing "diagnosis", "noseband" etc.
n-gram analyser is really working great for rest of the application in terms of partial matching. But I would like to avoid partial matching for such keywords- NOS, BNN, etc.
Slightly tricky scenario since you want to go against the default matching
algorithm. There is the keyword marker filter, but it will only protect
tokens against further filters that recognize the marker. Stemmers and
synonyms will recognize it, but I doubt n-gram filters will.
One hack would be to apply a pattern replace character filter [1] (which
works pre-tokenization) to convert those words to a nonsensical work
(unicode characters work best), which should be shorter than your minimum
n-gram and then reconvert them with a synonym filter. These special tokens
need to be unique enough to be found in the text (all caps is great).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.