I'm looking into ways of deduplicating data in our ES cluster (same document may appear in multiple indices under the same alias) and one of the potential methods seems to be writing a custom plugin that could filter records with identical _id values from the response.
Has anyone implemented something like this and can tell me whether it is feasible?
Also, all of the examples I've seen seem to be creating custom REST endpoints for the plugins - in case I want my plugin to apply as the default behavior, is it possible to register it under the existing /_search endpoint?
It is not possible to register more than one handler for a given endpoint (and /_search is obviously registered by core elasticsearch). Have you looked at field collapsing?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.