Is there a way in Elastic to aggregate the values in a particular field by the similarity between the words?
I mean if for example I have a field called "userMailbox", I would like to get groups of mailboxes that are similar but not identical,
One approach that might be of interest is to use a phonetic analyzer
You'd have to do some work to make aggregations work with it - perhaps calling the analyze api in your client code to get the phonetic tokens and concatenate them into a single keyword.
gmail.co == gmail.com is probably some custom normalisation you'd have to do in your client code.
Ingest piplelines can help wrap up some of the content processing logic if you don't want to stand up a custom data-preparation process.
Thank you very much Mark,
As i deeper into the subject it seems to me that it would be better for me to do the text clustering out of Elastic...
Do you have any nice advice you can give me on this issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.