Find and delete duplicate documents

See the description at https://qbox.io/blog/minimizing-document-duplication-in-elasticsearch, it suggests something like

curl -XGET 'http://localhost:9200/employeeid/info/_search?pretty=true' -d '{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "field": "name",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

which should do what you are looking for.

1 Like