Would Elasticsearch be suitable as a long-term storage system (besides being a querying system) for short to mid-term offline batch analytics using Apache Spark ? We’re talking petabyte-scale retention over a year with terabytes of new incoming data fed into a Kafka cluster and routed to Logstash, then ES. I’m worried that ES’s overhead would make a huge difference in terms of storage space usage against alternative solutions like compressed data on HDFS/HBASE. In other terms, is there a similar consistent, automatic management system to « archive » older data on an ES cluster ?
Thanks in advance.