Percentage of matched terms in Elasticsearch

sau.sharma · July 25, 2016, 9:45pm

I am using elasticsearch to find similar documents. Below is the query I am using:

{
    "query": {
        "more_like_this":{
            "like": {
                "_index": "docs",
                "_type": "pdfs",
                "_id": "pdf_1"
            },
            "min_term_freq": 1,
            "min_doc_freq": 1,
            "max_query_terms: 50,
            "minimum_should_match": "50%"
        }
    }
}

I am extracting the text from PDF and storing in my index "docs". Below are the mappings for type "pdfs":

{
 "properties": {
        "content":{
            "type": "string",
            "analyzer": "my_analyzer"
        }   
    }   
}

In the result sets I am getting similar documents with their scores. Based on what I have read so far it is not possible to calculate percentage similarity based on score so I am not trying to do that. I am trying to figure out if it is possible to know:

"Out of 50 query terms from the source document how many terms are matched in a document? or percentage of terms matched?"
As you can see that in my query I am specifying minimum_should_match as 50% so I am assuming that elasticsearch is filtering the documents somewhere based on the how much percentage of terms are matched in a document. I want to get that percentage. I am fairly new to elasticsearch. So far I have gone through the documentation but couldn't find out how to do it. Any pointer/help is appreciated!

Topic		Replies	Views
MoreLikeThis percent_terms_to_match Elasticsearch	3	597	July 6, 2017
MoreLikeThis query, what does percent_terms_to_match do? Elasticsearch	4	1627	July 6, 2017
How to use minimum should match Elasticsearch	2	634	September 21, 2017
How to find the percentage for any query success? Elasticsearch	2	369	November 21, 2022
Min_doc_count percentage Elasticsearch	2	652	July 5, 2017

Percentage of matched terms in Elasticsearch

Related topics