Similar documentation detection System


(vineeth mohan) #1

Hi ,

Is there a feature in ES where when i give a document to elasticSearch , it
tells me which all documents in the ES are similar to the document i
inserted.
Like instance let there be a score for similarity checking between documents
(between 1 to 100).

Whenever i add a document to ES , it should tell me which all documents have
similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the whole
lot if it takes too much of processing power).

Thanks
Vineeth


(Clinton Gormley) #2

Hi Vineeth

Is there a feature in ES where when i give a document to
elasticSearch , it tells me which all documents in the ES are similar
to the document i inserted.
Like instance let there be a score for similarity checking between
documents (between 1 to 100).

Look at:
http://www.elasticsearch.org/guide/reference/api/more-like-this.html
http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/mlt-field-query.html

Whenever i add a document to ES , it should tell me which all
documents have similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the
whole lot if it takes too much of processing power).

http://www.elasticsearch.org/guide/reference/query-dsl/range-filter.html

And combing queries with one or more filters:
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/and-filter.html

clint


(Karussell) #3

You should have a look into the more/fuzzy like this query:

http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html

http://www.elasticsearch.org/guide/reference/query-dsl/flt-query.html

and also

But to speed up these think you probably would have to use one of the
local sensitive hashing algorithm

On 29 Sep., 08:31, Vineeth Mohan vineethmo...@algotree.com wrote:

Hi ,

Is there a feature in ES where when i give a document to elasticSearch , it
tells me which all documents in the ES are similar to the document i
inserted.
Like instance let there be a score for similarity checking between documents
(between 1 to 100).

Whenever i add a document to ES , it should tell me which all documents have
similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the whole
lot if it takes too much of processing power).

Thanks
Vineeth


(vineeth mohan) #4

Thanks guyz ...
Elastic search simply rockz.....

Thanks
Vineeth

On Thu, Sep 29, 2011 at 12:56 PM, Karussell tableyourtime@googlemail.comwrote:

You should have a look into the more/fuzzy like this query:

http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html

http://www.elasticsearch.org/guide/reference/query-dsl/flt-query.html

and also

http://www.phpriot.com/news/duplicates-detection-with-elasticsearch

But to speed up these think you probably would have to use one of the
local sensitive hashing algorithm

On 29 Sep., 08:31, Vineeth Mohan vineethmo...@algotree.com wrote:

Hi ,

Is there a feature in ES where when i give a document to elasticSearch ,
it
tells me which all documents in the ES are similar to the document i
inserted.
Like instance let there be a score for similarity checking between
documents
(between 1 to 100).

Whenever i add a document to ES , it should tell me which all documents
have
similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the whole
lot if it takes too much of processing power).

Thanks
Vineeth


(vineeth mohan) #5

Just one more questions.

Is it possible to do the same without actually inserting the document ?
I need to decided if the document needs to be inserted looking at if there
are duplicates for it.

Thanks
Vineeth

On Thu, Sep 29, 2011 at 3:02 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Thanks guyz ...
Elastic search simply rockz.....

Thanks
Vineeth

On Thu, Sep 29, 2011 at 12:56 PM, Karussell tableyourtime@googlemail.comwrote:

You should have a look into the more/fuzzy like this query:

http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html

http://www.elasticsearch.org/guide/reference/query-dsl/flt-query.html

and also

http://www.phpriot.com/news/duplicates-detection-with-elasticsearch

But to speed up these think you probably would have to use one of the
local sensitive hashing algorithm

On 29 Sep., 08:31, Vineeth Mohan vineethmo...@algotree.com wrote:

Hi ,

Is there a feature in ES where when i give a document to elasticSearch ,
it
tells me which all documents in the ES are similar to the document i
inserted.
Like instance let there be a score for similarity checking between
documents
(between 1 to 100).

Whenever i add a document to ES , it should tell me which all documents
have
similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the
whole
lot if it takes too much of processing power).

Thanks
Vineeth


(Karussell) #6

yes, just query for it via mlt or flt

this is not indexing ...

On 29 Sep., 11:33, Vineeth Mohan vineethmo...@algotree.com wrote:

Just one more questions.

Is it possible to do the same without actually inserting the document ?
I need to decided if the document needs to be inserted looking at if there
are duplicates for it.

Thanks
Vineeth

On Thu, Sep 29, 2011 at 3:02 PM, Vineeth Mohan vineethmo...@algotree.comwrote:

Thanks guyz ...
Elastic search simply rockz.....

Thanks
Vineeth

On Thu, Sep 29, 2011 at 12:56 PM, Karussell tableyourt...@googlemail.comwrote:

You should have a look into the more/fuzzy like this query:

http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html

http://www.elasticsearch.org/guide/reference/query-dsl/flt-query.html

and also

http://www.phpriot.com/news/duplicates-detection-with-elasticsearch

But to speed up these think you probably would have to use one of the
local sensitive hashing algorithm

On 29 Sep., 08:31, Vineeth Mohan vineethmo...@algotree.com wrote:

Hi ,

Is there a feature in ES where when i give a document to elasticSearch ,
it
tells me which all documents in the ES are similar to the document i
inserted.
Like instance let there be a score for similarity checking between
documents
(between 1 to 100).

Whenever i add a document to ES , it should tell me which all documents
have
similarity of score more than 70.
Also i want ES to look into similarity of last N days only (Not the
whole
lot if it takes too much of processing power).

Thanks
Vineeth


(system) #7