we store not too small json documents in elasticsearch. We will add 10s of millions of documents every month.
We found that storing 30 mio documents eats up 2 TB of data.
As the format is json, there is a lot of repetition in each document (every key repeats in every document). So we started to shorten the keys and could reduce the size to 300 GB.
That's good but I feel its still way too much. If we wouldn't have the json boilerplate we still could reduce it by a factor of 10 easily.
On the other hand I feel that messing around with json keys and transforming them from something human understandable like 'baseCurrencyAmount' into something barely readable like 'baCA' is in general the wrong direction.
Is there any hint you can give?
And side question: it seems that elasticsearch compression compresses each document separately. As if would compress it somehow jointly, my guess is that the key lenghts would not play a role any more.