If one really wants to run ES directly on HDFS they can do so right now by mounting HDFS as a local NFS partition. However it will not only be slow but also there might be some data loss due to the various semantics - HDFS is not an actual file system.
As for the data, it can be indexed and query from Hadoop (and its various libraries) in a native, parallel way through the ES-Hadoop connector.
Hi Costin,
Many thanks for your kindly tips.
Still have questions that:
1)how could you solve the problem that elasticsearch service runs on datanode randomly? it confuses me for a long time.
2)could you please show me a examples of successful installation? better with installation documentation
by taking care of provisioning by yourself. Puppet, chef, basic ssh+scripts, whatever it's easier for you to deploy ES on the given machines. YARN doesn't provide any type of provisioning and makes no guarantees of where a process (short-lived) runs.
see 1. If you really want to use YARN, the docs already contain examples of how to start and stop it. If you need finer grained control , etc.., then I'm afraid ES-YARN does not provide them, at least in its current form.
I am Facing same problem as yours. If you have done it please let me know how can i figure out this problem.
I have hadoop cluster my elasticsearch service randomly runs on datanode as yours but whenever i put data on my es it start storing it in default elasticsearch datapath.
Please help me out i am stucked in it for long time.But In case of single hadoop node and single es its works fine.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.