Can elasticsearch reads and stores data in HDFS by es-hadoop?

Cancellala · February 2, 2016, 9:05am

Hi all,
I run elastic on YARN refer to this page:https://www.elastic.co/guide/en/elasticsearch/hadoop/master/ey-usage.html#yarn-provision-es

I saw that the elasticsearch service runs on datanode randomly, and the data of elasticsearch are stored locally, not in HDFS. How could this be?

As i see, the home page of es-hadoop show me that es-hadoop should make elasticsearhc can read and store data on hdfs, is it right ?

Could anybody please show me a examples of successful installation? better with installation documentation . i need it so much.

Thank you!

Cancellala · February 18, 2016, 7:53am

nobody can answer me? please~

costin · February 21, 2016, 10:49pm

If one really wants to run ES directly on HDFS they can do so right now by mounting HDFS as a local NFS partition. However it will not only be slow but also there might be some data loss due to the various semantics - HDFS is not an actual file system.

As for the data, it can be indexed and query from Hadoop (and its various libraries) in a native, parallel way through the ES-Hadoop connector.

Cancellala · February 26, 2016, 2:40am

Hi Costin,
Many thanks for your kindly tips.
Still have questions that:
1)how could you solve the problem that elasticsearch service runs on datanode randomly? it confuses me for a long time.
2)could you please show me a examples of successful installation? better with installation documentation

Best wishes.

costin · March 3, 2016, 9:39am

You're welcome.

by taking care of provisioning by yourself. Puppet, chef, basic ssh+scripts, whatever it's easier for you to deploy ES on the given machines. YARN doesn't provide any type of provisioning and makes no guarantees of where a process (short-lived) runs.
see 1. If you really want to use YARN, the docs already contain examples of how to start and stop it. If you need finer grained control , etc.., then I'm afraid ES-YARN does not provide them, at least in its current form.

krish0608 · May 23, 2016, 6:28am

Hi ,

I am Facing same problem as yours. If you have done it please let me know how can i figure out this problem.
I have hadoop cluster my elasticsearch service randomly runs on datanode as yours but whenever i put data on my es it start storing it in default elasticsearch datapath.
Please help me out i am stucked in it for long time.But In case of single hadoop node and single es its works fine.

Topic		Replies	Views
Understanding Elasticsearch-Hadoop Elasticsearch	1	401	April 4, 2015
HDFS storage options Elasticsearch es-hadoop	5	1643	January 14, 2016
How to index HDFS data Elasticsearch	1	1247	August 28, 2016
Elasticsearch with Hadoop setup Elasticsearch es-hadoop	1	982	May 10, 2016
Write elasticsearch's data directly on HDFS rather than on local filesystem Elasticsearch	1	359	February 24, 2020

Can elasticsearch reads and stores data in HDFS by es-hadoop?

Related topics