Percentage of matched terms in Elasticsearch

I am using elasticsearch to find similar documents. Below is the query I am using:

{
    "query": {
        "more_like_this":{
            "like": {
                "_index": "docs",
                "_type": "pdfs",
                "_id": "pdf_1"
            },
            "min_term_freq": 1,
            "min_doc_freq": 1,
            "max_query_terms: 50,
            "minimum_should_match": "50%"
        }
    }
}

I am extracting the text from PDF and storing in my index "docs". Below are the mappings for type "pdfs":

{
 "properties": {
        "content":{
            "type": "string",
            "analyzer": "my_analyzer"
        }   
    }   
}

In the result sets I am getting similar documents with their scores. Based on what I have read so far it is not possible to calculate percentage similarity based on score so I am not trying to do that. I am trying to figure out if it is possible to know:

"Out of 50 query terms from the source document how many terms are matched in a document? or percentage of terms matched?"
As you can see that in my query I am specifying minimum_should_match as 50% so I am assuming that elasticsearch is filtering the documents somewhere based on the how much percentage of terms are matched in a document. I want to get that percentage. I am fairly new to elasticsearch. So far I have gone through the documentation but couldn't find out how to do it. Any pointer/help is appreciated!

1 Like