Hello All,
I am a newbie to elastic search.
I am interested in using FScrawler to index all my files, then Elasticsearch find duplicates.
I'm thinking of using the md5 function to hash all the files: https://fscrawler.readthedocs.io/en/fscrawler-2.5/admin/fs/local-fs.html#file-checksum
Then use something like https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html to search for duplicates.
Or would something like this be a better approach.
Or is there a better way?