How to index short sentences for similarity search?

jimw · May 20, 2022, 4:01pm

I have a huge dataset, each document is the dataset contains some lines of short sentences.

My problem is: Given a document, I need search similar documents based on the threshold of how many percentage of short sentence are same. For example, if the threshold is 25%, then if the 25% of short sentences are same in two documents, they are thought similar.

My question is:
How should index the documents, and what similarity algorithm should be used? Thanks in advance for any suggestions and feedbacks.

Tomo_M · May 21, 2022, 8:22am

If you need exact match of such short sentences, you may use keyword fields and More like this query on that field. How about it??

system · June 18, 2022, 8:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
More like this both ways Elasticsearch	1	323	July 6, 2017
Need help on similarity ranking approach Elasticsearch	9	516	July 6, 2017
Elasticsearch single word similarity Elasticsearch	1	466	November 9, 2017
Search similar words in a big text Elasticsearch	3	537	July 6, 2017
Finding relevant documents Elasticsearch	8	634	September 15, 2017

How to index short sentences for similarity search?

Related topics