I'm looking at using frozen tier on S3 for around 1-2 TB of an index but this data will mostly be accessed using an id field (with the occasional alternative query). The ILM lifecycle is very convenient operationally as the data solely exists in ES today. Also, the partial index in cache design is appealing.
However, from what I've read, the frozen tier is optimized solely for time series data:
- Does this imply that an id lookup would scan the entire S3 bucket and load the full index into a node?
- Would it alternatively scan the entire bucket once and then cache only the id part of the full index?
- Is there a mechanism to hint that the id part of the index should be kept in cache?
- Can this id cache be propagated across several frozen tier nodes?
Thanks!