Let's say we have 300 million documents. An average document ID looks like this: aaaaaaaaaaaaaaaa_000000000_0_0_0_0
There are thousands of MGET operations every second for these documents.
We wanted to change the document ID to make it more readable, but our colleague says that the IDs are already too long and should be shorter because length affects performance and heap usage.
Is that true? Is there a resource where we could learn more about it. Thank You
The structure of document IDs do indeed affect heap usage. We started to investigate some alternatives to the auto-generated IDs that Elasticsearch creates if you don't give an explicit ID here:
It's not purely a function of length, because the changes discussed in that thread are not affecting the lengths of the IDs. As with all performance questions there is no substitute for careful benchmarking to determine the true effects of any change.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.