Can elasticsearch reads and stores data in HDFS by es-hadoop?


(Gavin Wei) #1

Hi all,
I run elastic on YARN refer to this page:https://www.elastic.co/guide/en/elasticsearch/hadoop/master/ey-usage.html#yarn-provision-es

I saw that the elasticsearch service runs on datanode randomly, and the data of elasticsearch are stored locally, not in HDFS. How could this be?

As i see, the home page of es-hadoop show me that es-hadoop should make elasticsearhc can read and store data on hdfs, is it right ?

Could anybody please show me a examples of successful installation? better with installation documentation . i need it so much.

Thank you!


(Gavin Wei) #2

nobody can answer me? please~


(Costin Leau) #3

If one really wants to run ES directly on HDFS they can do so right now by mounting HDFS as a local NFS partition. However it will not only be slow but also there might be some data loss due to the various semantics - HDFS is not an actual file system.

As for the data, it can be indexed and query from Hadoop (and its various libraries) in a native, parallel way through the ES-Hadoop connector.


(Gavin Wei) #4

Hi Costin,
Many thanks for your kindly tips.
Still have questions that:
1)how could you solve the problem that elasticsearch service runs on datanode randomly? it confuses me for a long time.
2)could you please show me a examples of successful installation? better with installation documentation

Best wishes.


(Costin Leau) #5

You're welcome.

  1. by taking care of provisioning by yourself. Puppet, chef, basic ssh+scripts, whatever it's easier for you to deploy ES on the given machines. YARN doesn't provide any type of provisioning and makes no guarantees of where a process (short-lived) runs.

  2. see 1. If you really want to use YARN, the docs already contain examples of how to start and stop it. If you need finer grained control , etc.., then I'm afraid ES-YARN does not provide them, at least in its current form.


(krishna singh) #6

Hi ,

I am Facing same problem as yours. If you have done it please let me know how can i figure out this problem.
I have hadoop cluster my elasticsearch service randomly runs on datanode as yours but whenever i put data on my es it start storing it in default elasticsearch datapath.
Please help me out i am stucked in it for long time.But In case of single hadoop node and single es its works fine.


(system) #7