I have problem with duplicate documents, so I am using method with Logstash described here:
It seems to do the job as after dedublication I get smaller number of documents. But then I get another problem, then I use SHA256 for hashing index takes double amount of space then original and when I use MURMUR3 it takes a little bit less space, witch is normal less documents -> less space.
Mapping is identical, and documents themselves look save apart of a lot longer "_id" with SHA256.
I can not use MURMUR3 because I have indexes with more documents when MURMUR3 hashing can generate unique IDs.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open telegraf-firewallconnections-2020.01 SIbOVfRCSfCokaEX1ILBZg 1 0 1237508 0 74.2mb 74.2mb green open telegraf-firewallconnections-2020.01mur3 N6P5fiM-Sm6UfEQlOu2aVQ 1 1 1188297 436 119.4mb 59.8mb green open telegraf-firewallconnections-2020.01sha256 3nMtJTKcSnCQZxFRGvcrzQ 1 1 1188470 242 305.4mb 152.7mb
So why SHA256 takes so much space?