Storage optimization for ElasticSearch storing large data


As part of devising a solution for monthly 100TB data store and storing about a year's logs, I was thinking of having some kind of hot data, lets say one month old data to be stored locally on local disk/JBOD etc. And for remaining 11 months data to be stored on SAN. I was thinking that the huge data store coming from SAN should be effectively used only for retrieval of past data and all the current logs will be written to locally attached disk providing faster write speeds.

Any thoughts, suggestions?

Hello Satish,

Please have a look at the blog post titled “Hot-Warm” architecture which describes this architecture and let us know if you have any questions.

Thanks @json. Interesting article. Have got a question:

It is mentioned - "Elasticsearch will automatically migrate the indices over to the warm nodes."

However there is a reference to curator doing the migration. Do we really need curator to do that OR ES will do the hot/warm migration automatically as mentioned in the article?

That is a good question. Elasticsearch does not do hot/warm migration automatically. You can do this on a time basis by using a cron job to the REST as mentioned in the section labeled "Warm Data Nodes" or with Curator as mentioned in the example.