In Elasticsearch, is possible to cluster documents that share the most similar texts, without giving an initial query to compare to?

pachilo · June 27, 2017, 12:57pm

In Elasticsearch, is possible to group documents that share the most similar texts, without giving an initial query to compare to?

I know is possible to query and get "more like this document" but, is possible to cluster documents within an index according to a field values?

For instance:

document 1: The quick brown fox jumps over the lazy dog

document 2: Barcelona is a great city

document 3: The fast orange fox jumps over the lazy dog

document 4: Madrid is a great city

document 5: I do not like to eat fish

Now, perform some kind of aggregation that, without giving a search query, it can group:

Group 1: document 1 and document 3

Group 2: document 2 and document 4

Group 3: document 5

I will really appreciate any clue!

colings86 · June 27, 2017, 2:08pm

There is not currently an aggregation which performs clustering. There is an issue for adding k-means clustering as an aggregation (https://github.com/elastic/elasticsearch/issues/5512) and I played around with a prototype for this a while ago but there are some changes that would need to be made to the aggregations framework itself to support this kind of aggregation and that work is yet to be done.

pachilo · June 27, 2017, 2:18pm

Ohhh, good and sad to know then, thanks a lot Colin.

system · July 25, 2017, 2:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grouping by similarity Elasticsearch	6	1942	May 20, 2019
How to find Similar documents Elasticsearch	4	2528	July 5, 2017
How to find document similarity in ElasticSearch? Elasticsearch	2	443	July 5, 2017
Group Documents by it's similarity Elasticsearch	1	340	August 30, 2019
Document Clustering Elasticsearch	3	1154	July 6, 2017

In Elasticsearch, is possible to cluster documents that share the most similar texts, without giving an initial query to compare to?

Related topics