Query HDFS

Does anybody know if Elasticsearch can directory query HDFS where the target data stored?
In other words, is it possible for Elasticsearch without having any data to make a query?
I'm not familiar with Hadoop HDFS Repository Plugin, but this is a right way?

no, that's not possible. Data needs to be indexed into Elasticsearch to be available for queries.

I just learned there's _source enabled option and thought it might be a possible way if it's set to disabled, am not 100% sure, though..

BTW, you are saying that elasticsearch can't completely query hdfs?
I'm looking into a reliable and proven architecture between ES and hdfs as I've heard that master data shouldn't be in ES but HDFS instead because of the safety of data.



Where did you hear that?

OK, let me rephrase my questions.

-Can ES be a storage for master data? and does it require any data backup strategy?(Can ES be a reliable data storage?)
Not 100% sure for now, but my guess is that hdfs as ES's data backup storage and ES requiring to have the searchable data. (Data needs to be loaded into ES)

-Can ES search even in PDF files?

-Does ES require ES-Hadoop connector to exchange data between ES and hdfs? Any alternatives?

I'm newbie to ES and learning it from scratch now, but it takes time...Your help is great pointer so that I can (re)direct my learning.


Yes and Yes.

If you index them in ES, yes.

Yes. DIY.

