ES as a long-term storage system inside analytics architecture?

ELKnewbie · May 5, 2017, 10:42am

Would Elasticsearch be suitable as a long-term storage system (besides being a querying system) for short to mid-term offline batch analytics using Apache Spark ? We’re talking petabyte-scale retention over a year with terabytes of new incoming data fed into a Kafka cluster and routed to Logstash, then ES. I’m worried that ES’s overhead would make a huge difference in terms of storage space usage against alternative solutions like compressed data on HDFS/HBASE. In other terms, is there a similar consistent, automatic management system to « archive » older data on an ES cluster ?

Thanks in advance.

ELKnewbie · May 7, 2017, 10:58am

So I guess the answer is : NO?

rusty · May 10, 2017, 8:37am

Hello, it's really depends. You should make similar estimation to understand is it fits for your purposes (do you need to index data, do you need doc_values, do you need replication, is best_compression codec suitable and so on). IMHO for now ES is too memory hungry for petabyte-scale solutions.

system · June 7, 2017, 8:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES and SAN storage Elasticsearch	6	3227	July 6, 2017
Questions from a newbie Elasticsearch	15	420	July 6, 2017
ElasticSearch for +500gb Audit Trail Elasticsearch	4	1136	September 23, 2017
How can we store large scale data with 32GB RAM / 30TB disk on machine Elasticsearch	5	1106	June 5, 2017
Hardware for ELK Elasticsearch	8	489	May 7, 2018

ES as a long-term storage system inside analytics architecture?

Related topics