Using replicas for long term storage in SAN


We are looking at a production implementation of ELK to replace our logging solution. We are looking at creating daily indices, which would make it easy to manage them.

Here is the issue we would like to have 90 days worth of logs/ indices on the faster disk. And would like to 365 days worth indices in slower disk. But we need all the logs to be searchable.

Is it possible to have 2 replica shards for every primary shard, allocate one of the replica shards to the slow disk( which will also have ES instance running on)?

Can we customize the search engine to not to read from the slower disk node unless we specifically say so?



Basically you need to move any "old" data onto nodes that are cold, you cannot put a specific shard set (ie replica or primary) on a specific group of machines.

can explain on what you mean by cold? Would cold nodes be searchable?

Cold nodes being those with more, slower storage - ie spinning disks versus SSD.

Yes they are still searchable.

You can do this by using a Hot/Warm architecture where a set of nodes with fast storage handle all indexing as well as querying on the most recent data and a different set of nodes, often with slower disk, handle older data which is generally read-only.

Thanks Christian, looks like this will solve my problem.

Will the search be quicker on the hot nodes..what are the configuration changes need. The article doesn't mention.

Also how much compression does elasticsearch offer?

Search is usually quicker on hot nodes as they have more/faster resources to work with.

ES compressed data by default, that ratio depends entirely on the dataset.

Is there a way to remove replicas once the indices are moved to cold node?

Just set them to 0 using the APIs.