Check if similare documents exists?

Antoine94 · January 7, 2018, 9:59pm

Hi,

I'm doing some tests with ELK (Latest version), here is my use case:

I'm indexing some documents containing phone activity, where I use these fields:

SYSTEMA.EndUserName
SYSTEMA.EndUserPhoneNumber
SYSTEMA.CounterpartPhoneNumber
SYSTEMA.Call Direction
SYSTEMA.Date

On other side, I'm also indexing similar documents

SYSTEMB.EndUserName
SYSTEMB.EndUserPhoneNumber
SYSTEMB.CounterpartPhoneNumber
SYSTEMB.Call Direction
SYSTEMB.Date

My target is to search for any SYSTEMA document if a similar SYSTEMB document exists (With same EndUserName, EndUserPhoneNumber, CounterpartPhoneNumber, CallDirection) and approximatively same date (With few second of difference).

I thought about creating a scripted field for SYSTEMA checking the existence of a similar SYSTEMB document, what do you think?

Do you have another way to achieve this?

Many thanks,
Regards.

arisbanach · January 8, 2018, 1:15am

Why not have the fields all be on the parent level and add an extra called "system" and then the value is "a" or "b"? Then just have a cardinality aggregation sorted by lowest to highest. That way, it will collect unique values into buckets and then display the buckets that have the most duplicate documents for those fields.

Antoine94 · January 8, 2018, 9:04am

Thank you for your reply.

I'm looking for the most effective way to achieve this, do you think that adding a scripted field looking for a similar document is the best way?

I'm thinking about something like a scripted field for the system A telling if a similar document for system B is existing.

What do you think?

Thanks again for your answer.

arisbanach · January 8, 2018, 2:20pm

I'm not sure since I'm relatively new to Elasticsearch, sorry! I think that a scripted field could work, but I don't know if there are better ways.

system · February 5, 2018, 2:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Some question about duplicate Elasticsearch	2	332	July 5, 2017
Finding duplicate documents or its count based on some field names Elasticsearch	5	5881	July 6, 2017
Return Duplicates Elasticsearch	3	752	September 11, 2017
Similar documentation detection System Elasticsearch	6	401	July 6, 2017
Find and show values of a field, which is also in another field Kibana	5	2031	October 25, 2019

Check if similare documents exists?

Related topics