Avoid Partial Text Match using n-gram analyser

I have a scenario where in I am using n-gram analyser as Default Analyser and Standard analyser as search analyser in my settings for the field - "title".

The word "NOS" is very specific to our application and I do have few documents with the text starting with "NOS". When I do a search I am getting results of all the documents including the ones that has NOS as a substring. But my expectation is to get only documents which has only NOS text in it.

For example : When I search for "NOS" i get the results containing "diagnosis", "noseband" etc.

n-gram analyser is really working great for rest of the application in terms of partial matching. But I would like to avoid partial matching for such keywords- NOS, BNN, etc.

Any help please?

Slightly tricky scenario since you want to go against the default matching
algorithm. There is the keyword marker filter, but it will only protect
tokens against further filters that recognize the marker. Stemmers and
synonyms will recognize it, but I doubt n-gram filters will.

One hack would be to apply a pattern replace character filter [1] (which
works pre-tokenization) to convert those words to a nonsensical work
(unicode characters work best), which should be shorter than your minimum
n-gram and then reconvert them with a synonym filter. These special tokens
need to be unique enough to be found in the text (all caps is great).


Thank you @Ivan for the help. I will try pattern replace approach and see how it goes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.