Duplicate documents detection in Elasticsearch

jprante · December 27, 2015, 1:14am

Yes, you can detect plagiarism, at client side. Use fingerprinting method and More-like-this to detect various degrees of document similarity.

You can easily detect duplicates, by just indexing checksum, like CRC-32 or Adler.

There is no plugin I know of and the reason is obvious, the scenarios and requirements are too difficult for a generic solution. You have to program the detection for yourself.

See Plagiarism detection

Topic		Replies	Views
Plagiarism detection Elasticsearch	6	4762	July 5, 2017
Near duplicate document detection Elasticsearch	2	1396	August 12, 2020
Similar document detection Elasticsearch	2	553	March 18, 2017
[RFC] idea for a near duplicate filter Elasticsearch	2	1264	July 6, 2017
Finding documents _almost_ the same Elasticsearch	5	2757	December 13, 2016

Duplicate documents detection in Elasticsearch

Related topics