Hi,
I have a very basic ES-Setting. All items have just two fields id and content.
I want to find the top 20 (or 100) most similar documents for each document in my index by getting their BM25 Score. My understanding is, that this can be achieved by issuing MLT-queries. However, for some documents, I receive less than 20 results, for some even zero. But shouldn't each and every document receive a score, regardless of how poorly it is? Furthermore, I know that there are fairly similar documents in my dataset. So finding 0 or just 4 which are deemed similar is definitely not the answer that I was looking for.
To conclude: I want to have the top 20 BM25 Scores for all items in my index regarding the content field. Right now my query looks like this:
{
'query':
{'more_like_this':
{'fields': ['content'],
'like':
{'_index': 'war_stories', '_id': 85},
'min_term_freq': 1, 'min_doc_freq': 1}
},
'from': 0, 'size': 20}
The index has roughly 22000 Documents in one local shard.
Thanks for any insight.