How to Index Words Actual form and Modified form into Elastic Search


(Karthik) #1

Hi Team,

I want to index the words into elasticsearch with actual form and modified form.

Example:

The Term "F-35" want to index is "F35" and "F-35", when i search the text F35 or F-35 both should return the document.

Note: Here i am using White space analyzer, so it will not be split into two tokens.

Please someone provide me option to achieve this.


#2

Hi Karthik,

if you want to remove the punctuation aswell, you should consider using the standard tokenizer.
If that removed more than you want, you can consider using a Pattern-Analyzer. But be aware that this might be very slow.

Make sure to test your custom analyzers using the _analyze API


(Karthik) #3

Hi SaskiaVola,

Thanks for your time to reply this conversation.

When we use standard tokenizer it will split from "F-35" into 2 different tokens as "F" and "35".
But i am expecting to be a single token like "F35" and "F-35".

Even Pattern-Analyzer will split the token by punctuation i guess?


#4

Hi Karthik,

that's correct. So depending on your data, if you can define a proper pattern for the cases you're referring to, you could use a character filter first, that removes the hyphen inside of words that contain numbers.

Then a query for "F35" and "F-35" would match docs containing both variants.

Hope that works for you.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.