Speeding up "more_like_this" query

arshad_javeed · October 23, 2019, 10:51am

Hi,

I was interested in fetching similar documents for a given input document (similar to KNN). Since I'm dealing with only text fields, I went ahead with using more_like_this query, which does the job for text fields.
But I was concerned about the performance when I have millions of documents indexed in ES. The documentation says that using term_vector to store the term vectors at the index time can speed up the analysis.
But what I don't understand is which type of term vector the documentation refers to in this context. As there are three different types of term vectors: term information, term statistics, and field statistics.
And term statistics and field statistics compute the frequency of the terms with respect to other documents in the index, wouldn't these vectors be outdated when I introduce new documents in the index.
Hence I presume that the more_like_this documentation refers to the term information (which is the information of the terms in one particular document irrespective of the others).

Can anyone let me know if computing only the term information vector at the index time is sufficient to speed up more_like_this?
Also, it would be helpful if there's a performance evaluation report/stats for "more_like_this".

Thanks.

system · November 20, 2019, 10:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A deeper understanding of term vectors and the more like this query Elasticsearch	1	536	July 5, 2017
Stored term vectors still slow when retrieving their scores (terms filtering) Elasticsearch	13	1485	July 5, 2017
MoreLikeThis query performance with some extremely common words Elasticsearch	1	750	July 5, 2017
Term vectors used in a “more like this” search Elasticsearch	1	604	June 30, 2017
Do term vectors accelerate phrase queries? Elasticsearch	1	255	March 9, 2022

Speeding up "more_like_this" query

Related topics