Is there any way for doing complicated mathematical matching of indexes while retrieval? <not just simple text comparison >


(rohitkondekar) #1

Hello everyone,
We are here basically working on a project for developing a content
based video retrieval system. We have used hadoop + Javacv + Xuggler
for hashing the videos in form of feature vectors represented as
histograms. So for real time retrieval of these videos, we were
planning to use elastic search or Solr, for indexing these feature
vectors.

But the problem is, when the input query clip is given by user for
comparison -- which is also hashed to obtain the feature vectors,
which then can be used for comparison with the indexed hashed values,
but.. Solr does a string comparison or i should say text matching for
retrieval, whereas what we really need to do is something sort of
Histogram intersection of the query's histogram with that of indexed
ones to obtain a similarity value. So how can we do this? is there any
way out for doing a mathematical comparison in elasticsearch?

Thank you.


(Clinton Gormley) #2

Hi Rohit

On Mon, 2012-02-20 at 18:51 -0800, Rohit wrote:

Hello everyone,
We are here basically working on a project for developing a content
based video retrieval system. We have used hadoop + Javacv + Xuggler
for hashing the videos in form of feature vectors represented as
histograms. So for real time retrieval of these videos, we were
planning to use elastic search or Solr, for indexing these feature
vectors.

But the problem is, when the input query clip is given by user for
comparison -- which is also hashed to obtain the feature vectors,
which then can be used for comparison with the indexed hashed values,
but.. Solr does a string comparison or i should say text matching for
retrieval, whereas what we really need to do is something sort of
Histogram intersection of the query's histogram with that of indexed
ones to obtain a similarity value. So how can we do this? is there any
way out for doing a mathematical comparison in elasticsearch?

You could look at using scripts, either mvel, javascript, python or
native java scripts, see:
http://www.elasticsearch.org/guide/reference/modules/scripting.html
http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/custom-filters-score-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/script-filter.html

but be aware that this could be quite heavy as it will probably need to
run the script on all docs

clint


(rohitkondekar) #3

hi clinton,

I looked in native (java) scripts, but what i could notice is, there we are using script after the indexed are matched? on the score which is retrieved (or in short on the docs).
So how will i do the search based on the computation of similarity factor?

could you clear me out a bit.. for eg
i have two vectors v1 - [10,2,10,13] with corresponding url
v2 - [11,14,1,3]

now v1 i want to index somehow..... (so how should i do it? what should be the index/type/id and jason doc?)

now i want to match v2 with v1 thereby giving a similarity factor like computing eucladian distance, then if this comes out to be 95% similar, then only i want to retireve it.

So how to do this using native scripts?

Plz help me out ... i am fully stuck on this point.

Thanks


(system) #4