Data Stream data retrieval and compression

I'm working on a data stream. I have created a test ILM policy. Have a couple of question regarding that.

  1. Is there a huge difference time lag while retrieving data from hot, warm, cold phase indices. I tried inserting a few documents and used shrink API to reduce number of shards to 1 in the warm phase. But I'm not experiencing much time difference in retrieving the data from warm, hot and cold phases. Is that the case or will it differ when there is large amount of data. How much of a time difference can we expect for data retrieval between the phases.

  2. I'm trying to see if there is a compression technology on data stream. ie, the data on the cold phase is not needed for searching anymore. Can we zip that data so that we can get more disk space on the cluster and can store store more data to the cold phase. Does the size reduces when we move index to cold phase itself? Or the shrink API is actually used for reducing the size of the index(I'm not sure if reducing the number of shards reduces the size of the index)

Retrieving how exactly? Are the phases on the same hosts, or do you have different hardware profiles for each phase?

Elasticsearch compresses by default. You cannot zip the underlying data without losing access to it.

The _shrink API is used to reduce the number of shards, it should also help to minimise the size of the index.

They are on the same host only. I just want to know if the data retrieval on warm and cold phases have any difference in time it takes. I tried using a _search query on an index in hot and warm phases on postman. I couldn't see much difference in getting the data. But for large data, how large will be the time lag on different phases

So in cold phase, we can't reduce it's size much further using any compression technology?

If they are on the same host, no.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.