I have a use case where Elasticsearch Transport Client Java API is used to connect a Hadoop cluster to a remote ES cluster. However, pinging is disabled on the Hadoop cluster as a security feature. Is this the reason why I am getting NoNodeAvailableException while trying to establish connection. Is it required to enable ping or is there any other way?
This forum is mainly about ES-Hadoop project; in your case it looks like you are using custom code. Not sure what you mean by 'pinging' but a quick inspection of the network connectivity and ports between your client and ES should sort things out.
I am using the custom node within an ES-Hadoop Project to retrieve index names and types because the index names and types are created dynamically on a daily basis. I went for this solution as dynamic reading is not supported by the Cascading API.
I have a function that uses the transport client to connect to ES cluster to retrieve all the existing indices and the types in each indices for specific interval.
Tap esInTap = new EsTap(indexName + "/" + indexType, Fields.ALL);
Tap hdfsOutTap = new Hfs(new TextLine(new Fields("line")), hdfsPath, SinkMode.UPDATE);
Why don't you use an alias instead? You can run a cron job that every X hours/days takes your interval and updates the alias. Then your job would simply point to it - it will be only the alias that would change and your cascading job would remain the same.
From your code I can't see how your specific 'interval' is but ES-Hadoop can read from multiple or even all indices.
My requirement is exactly that, runtime resolution of index names and types. That is why I have implemented the Transport client to retrieve the index names and types and then used them in the cascading workflow to save data into HDFS.
I am not sure how alias would help me, because although the index name prefixes remain the same, the types in each index are different. Moreover, I need to store them in correspondingly named folders in HDFS.
For example, if index-2015.12.12 has type1 and type2. The folder structure in HDFS should be hfdspath/index-2015.12.12/type1/docs, hfdspath/index-2015.12.12/type2/docs.
However if index-2015.12.13 has type1, type2 and type3, then they should be saved under hfdspath/index-2015.12.13/type1/docs, hfdspath/index-2015.12.13/type2/docs and hfdspath/index-2015.12.13/type3/docs
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.