Document Similarity

Hi Guys,

Just like to find out what options there are to determine how similar two
documents are when both documents are stored in ES?

While I can use the more_like_this_field query to determine a score based
on a query, is there anyway that I can provide two document IDs and get the
similarity score between them?

I have found this:
http://stackoverflow.com/questions/1844194/get-cosine-similarity-between-two-documents-in-luceneusing Lucene to compare two documents, so I can create my own query if
required, (I'm maintaining my own ES fork for the time being), but was
hoping there was something already in ES that could allow me to do this?

The context for wanting this, is that I need to build a similarity matrix
of all documents stored in a single index in ES to determine how similar
all documents are to each other. (This is to aid a machine learning
activity). If I can have a query that does only two documents at a time,
this is fine, but the creation of a similarity matrix directly in ES would
be great! (Note: in my context, I only dealing with a small number of
documents, less than 5000, but they are very large documents).

Kind regards

Darran

PS. My fork can be found here:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.