So my key is basically: "100_CUSTOM_TYPE_300_bla_foo" which is a very bad none sortable and none compressed key, but it is my key. I thought about just doing an md5 on it and use the result, but I am not sure if this is the optimal solution.
I looked at how Elasticsearch implemented the id generation, which is based on flake ids, which are time-based, but this is not my case.
Sure, I need to index the same document, essentially update the same one.
I load my data from S3, and index it to Elastic. While those ids are already referenced by other entities in the system. The document in S3 must already contain the id before it is being indexed.
Ok then, so what's the problem with your existing id approach? You don't really want to sort on it, compressibility would also be less than ideal for the default auto-generated id as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.