Elasticsearch & hadoop


(Jlogan) #1

Hi,recently I need to integrate ES and hadoop via elasticsearch-hadoop, What about the performance of reading data from ES to hadoop, and the performance of reading data from hadoop to ES?Let‘s say that I have 100G data per day need to process. What should i chose? 1.data-hadoop-ES. or 2.data- ES-hadoop.
THANKS


(Mike Barretta) #2

@JIELOGAN, is your question about which version of Hadoop to use, v1 or v2?

If so, that's not an ES-Hadoop question as the answer depends on which version of Hadoop you are running, 1.x or 2.x. If you're starting new with Hadoop, I'd definitely recommend going with the most current version, 2.8 (assuming you're not limited to a given version via some other requirement)

With regards to write performance from Hadoop into Elasticsearch, see: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/arch.html. In short, the level of parallelism for the writing is based on the number of splits and mappers in your dataset/cluster: more splits == more mappers == more parallelism


(James Baiera) #3

@JIELOGAN I would use Hadoop 2.7.x if you are planning on using ES-Hadoop. 2.8 is not yet supported.


(Mike Barretta) #4

Doh, thanks for the clarification, @james.baiera!


(Jlogan) #5

thank you for your help


(Jlogan) #6

thanks, i also used 2.7.x


(Samy Nathan) #7

Hi ,
i have large amount of data in ES (2.4.1) index .i need the index data moved into hadoop hdfs stroage. how to do the task .

following steps iam doing

Es version 2.4.1
Apache hadoop version 2.7.3

/elasticsearch-2.4.1$ bin/plugin install elasticsearch/elasticsearch-repository-hdfs/2.4.1

Start the elasticsearch
start hadoop

after what iam doing ???


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.