I want to save about 50TB/day's data from logstash and search them with elasticsearch. Is there any good solution for it?
According to offical introductions, I only can think out the below three ways, but I cannot make sure wethere there are correct, because of the large data. Could anyone help me please? Thanks a lot.
Solution#1:
1)Logstash outputs data into elasticsearch.
2)Elasticsearch uses snapshot to keep a backup in hadoop(hdfs), and can restore it.
Solution#2:
1)Logstash outputs data into hadoop(hdfs).
2)Mount hdfs as a local fs with NFS.
3)Elasticsearch the local hdfs.
Solution#3:
1)Logstash output data into hadoop(hdfs).
2)Create a table and and external table which is STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' in Hive.
3)Load data into table.
4)Elasticsearch can index the data.
Why do you want to use Hadoop? It might sound like a weird question but do consider it?
ES-Hadoop is useful when the data is in Hadoop and you try to get it into ES. Or you have a computational grid like Spark where you crunch numbers and need to tap into the data in ES.
By using logstash, it looks like you already have the means to move data from your source to Elasticsearch. So why go through Hadoop?
Note that involving a different system means allocating the necessary resources. Your 50TB would have to sit in Hadoop (until being consumed) and in Elasticsearch. Plus the network and CPU overhead to move them across to HDFS and then to Elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.