Save and search data with es & hadoop


#1

I want to save about 50TB/day's data from logstash and search them with elasticsearch. Is there any good solution for it?
According to offical introductions, I only can think out the below three ways, but I cannot make sure wethere there are correct, because of the large data. Could anyone help me please? Thanks a lot.

Solution#1:
1)Logstash outputs data into elasticsearch.
2)Elasticsearch uses snapshot to keep a backup in hadoop(hdfs), and can restore it.

Solution#2:
1)Logstash outputs data into hadoop(hdfs).
2)Mount hdfs as a local fs with NFS.
3)Elasticsearch the local hdfs.

Solution#3:
1)Logstash output data into hadoop(hdfs).
2)Create a table and and external table which is STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' in Hive.
3)Load data into table.
4)Elasticsearch can index the data.


#2

By reading Costin's answers for other questions, it seemed that the solution#2 should be no good because of the performance and so on....


#3

Could anyone help me please? Thanks a lot!


(Costin Leau) #4

Why do you want to use Hadoop? It might sound like a weird question but do consider it?
ES-Hadoop is useful when the data is in Hadoop and you try to get it into ES. Or you have a computational grid like Spark where you crunch numbers and need to tap into the data in ES.

By using logstash, it looks like you already have the means to move data from your source to Elasticsearch. So why go through Hadoop?
Note that involving a different system means allocating the necessary resources. Your 50TB would have to sit in Hadoop (until being consumed) and in Elasticsearch. Plus the network and CPU overhead to move them across to HDFS and then to Elasticsearch.


(system) #5