Text similarity with Elasticseacrh

Kfiro · October 12, 2020, 8:51am

Hi everyone,

Is there a way in Elastic to aggregate the values in a particular field by the similarity between the words?
I mean if for example I have a field called "userMailbox", I would like to get groups of mailboxes that are similar but not identical,

TIA

AClerk · October 12, 2020, 11:17pm

Can you give an example?
What is similar?

Kfiro · October 13, 2020, 8:38am

Hi,
thanks for the response,

example:
Let’s say I want to aggregate all the mailboxes in the UserMailbox field, so it should look something like this

alex1@gmail.com bucket 1
alexa@gmail.com
Aleks@gmail.com
alex@gmail.co
aLex@gmail.com

yanna@aol.com bucket 2
yana@aol.com
Yona@aox.co

admin@aol.com bucket 3
aDmins@aol.com
admin1@aol.com

Mark_Harwood · October 13, 2020, 11:12am

There's no easy out-of-the-box answer to this.

One approach that might be of interest is to use a phonetic analyzer

You'd have to do some work to make aggregations work with it - perhaps calling the analyze api in your client code to get the phonetic tokens and concatenate them into a single keyword.

gmail.co == gmail.com is probably some custom normalisation you'd have to do in your client code.

Ingest piplelines can help wrap up some of the content processing logic if you don't want to stand up a custom data-preparation process.

Kfiro · October 13, 2020, 9:41pm

Thank you very much Mark,
As i deeper into the subject it seems to me that it would be better for me to do the text clustering out of Elastic...
Do you have any nice advice you can give me on this issue?

system · November 10, 2020, 9:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation for similar strings Elasticsearch	1	455	July 6, 2017
How to make a match query with aggregations? Elasticsearch	5	182	March 20, 2024
Searching and Aggregating Terms Elasticsearch	2	495	July 6, 2017
Categorizing similar values Elasticsearch	2	266	December 28, 2020
Search similar words in a big text Elasticsearch	3	513	July 6, 2017

Text similarity with Elasticseacrh

Related topics