Dot product in Elastic Search


(Chris) #1

How can I do dot product in Elastic Search?

Let's say my indexed documents in the form:

{
   "name": "...",
   "features": [
      {"f_i": 0.xxx},
      ...
      {"f_j": 0.xxx}
   ]
}

For example, one document could be:

 {
           "name": "img1",
           "features": [
              {"f1": 0.6},
              {"f3": 0.4}
           ]
   }

Another document could be

 {
           "name": "img2",
           "features": [
              {"f2": 0.1},
              {"f3": 0.5},
              {"f5": 0.4}
           ]
   }

Basically, the features field of each document is a sparse vector where we only display the elements that have values > 0. Assuming that we know the whole length of the sparse vector, let's say 100.

Now I want to query a new document to find the one in the index that has the most similar features, by which I measure by taking the dot product of 2 vectors. Say my query document is:

 {
           "name": "img",
           "features": [
              {"f3": 0.2},
              {"f4": 0.4},
              {"f5": 0.1},
              {"f6": 0.3}
           ]
   }

then the dot product with the 1st example above only happens at f3 so it is 0.4 x 0.2 = 0.08, while the dot product with the 2nd example above happens at f3 and f5 so it is 0.5 x 0.2 + 0.4 x 0.1 = 0.14. Thus the query would give higher score for 2nd document.

How do I do that? Thanks

P/S: I had another topic asking about searching for nearest neighbor in ES, now I figure out that that process should be done outside of ES, thus now the remaining problem can be stated clearer like this.


(Chris) #2

Any idea would be greatly appreciated. Thanks a lot!


(system) #3