BM25 score of two indexed document in elasticsearch


(Saeed Zhiany) #1

I have an index with two type t1 and t2. a few documents has been index in each of them with following structure:

{
  "mappings": {
    "t1": {
      "properties": {
        "body": {
          "type": "string"
        }
      }
    },
    "t2": {
      "properties": {
        "body": {
          "type": "string"
        }
      }
    }
  }
}

how can I obtain BM25 score between two arbitrary documents D1 and D2. please note that I don't want to find Documents similar to D1 or D2. so I think more_like_this query is not proper for my case.


(David Pilato) #2

Note that multiple types is not allowed anymore in 6.x.


(Saeed Zhiany) #3

I don't knew that, thank you! but it is not important right now, because as you see structure of documents are same. I can store my documents in one type.
are you have any solution for main problem?


(David Pilato) #4

You can just obtain individual scores and compare on client side I think.

What do you want to do with that score exactly?


(Saeed Zhiany) #5

how can I do that?!
BM25 has been defined as a score between a Query and a document, such that the query can be a document itself. am I right?

I want use this score as a feature for a learning to rank application.


(David Pilato) #6

You mean getting a score when there is no query?

I don't know.

May be others have ideas.

If you extract all the values you have in document A then build a bool query with should clauses for each field, may be you can call https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-explain.html and get that?

But I'm unsure here if it would help.


(Saeed Zhiany) #7

"You mean getting a score when there is no query?"

exactly, The original BM25 formula can be used to calculate the score between each arbitrary documents. (query also can be consider as a document)

"If you extract all the values you have in document A then build a bool query with should clauses for each field, may be you can call https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-explain.html and get that?"

I don't think this is works. because the formula is a sum of aggregated weight on common words between two documents.

I can write a code that calculate this score. it's not hard, but I thought may be it's implemented in elasticsearch in optimal way.


(David Pilato) #8

I dunno. @jimczi or @jpountz might have ideas?


(Saeed Zhiany) #9

for calculating BM25 by myself, I need length of document and average document length on whole documents in collection. according to formula, length of a document is number of it's word. I search a lot, but no success to find a way to elastic return this values to me.
could you please give me a solution?


(Adrien Grand) #10

There is no built-in way to do this. One way would be to build a giant disjunction out of D1 and run it against D2.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.