How to Index Words Actual form and Modified form into Elastic Search

Karthik.krishnan · October 31, 2018, 5:57am

Hi Team,

I want to index the words into elasticsearch with actual form and modified form.

Example:

The Term "F-35" want to index is "F35" and "F-35", when i search the text F35 or F-35 both should return the document.

Note: Here i am using White space analyzer, so it will not be split into two tokens.

Please someone provide me option to achieve this.

SaskiaVola · November 1, 2018, 11:55am

Hi Karthik,

if you want to remove the punctuation aswell, you should consider using the standard tokenizer.
If that removed more than you want, you can consider using a Pattern-Analyzer. But be aware that this might be very slow.

Make sure to test your custom analyzers using the _analyze API

Karthik.krishnan · November 1, 2018, 3:49pm

Hi SaskiaVola,

Thanks for your time to reply this conversation.

When we use standard tokenizer it will split from "F-35" into 2 different tokens as "F" and "35".
But i am expecting to be a single token like "F35" and "F-35".

Even Pattern-Analyzer will split the token by punctuation i guess?

SaskiaVola · November 1, 2018, 5:46pm

Hi Karthik,

that's correct. So depending on your data, if you can define a proper pattern for the cases you're referring to, you could use a character filter first, that removes the hyphen inside of words that contain numbers.

Then a query for "F35" and "F-35" would match docs containing both variants.

Hope that works for you.

system · November 29, 2018, 5:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does Elasticsearch treat punctuation marks on index? Elasticsearch	1	2577	July 6, 2017
Changing tokenizer from whitespace to standard Elasticsearch	4	2591	July 6, 2017
Changing Analyzer behavior for hyphens - suggestions? Elasticsearch	7	11998	July 5, 2017
Handling hyphenated words like "e-mail" with tokenizer/stemmer Elasticsearch	5	1382	December 27, 2016
It is possibile don't token word with elasticsearch? Elasticsearch	3	382	July 6, 2017

How to Index Words Actual form and Modified form into Elastic Search

Related topics