Can Elasticsearch Transport Client work without ping


(Manju) #1

Hello all,

I have a use case where Elasticsearch Transport Client Java API is used to connect a Hadoop cluster to a remote ES cluster. However, pinging is disabled on the Hadoop cluster as a security feature. Is this the reason why I am getting NoNodeAvailableException while trying to establish connection. Is it required to enable ping or is there any other way?

Thanks for your help.

Manju


(Costin Leau) #2

This forum is mainly about ES-Hadoop project; in your case it looks like you are using custom code. Not sure what you mean by 'pinging' but a quick inspection of the network connectivity and ports between your client and ES should sort things out.


(Manju) #3

I am using the custom node within an ES-Hadoop Project to retrieve index names and types because the index names and types are created dynamically on a daily basis. I went for this solution as dynamic reading is not supported by the Cascading API.


(Costin Leau) #4

What do you mean by "dynamic reading"? How does this work in your code? Can provide a quick, high-level function?


(Manju) #5

I have a function that uses the transport client to connect to ES cluster to retrieve all the existing indices and the types in each indices for specific interval.

String[] allIndices = client.admin().cluster().prepareState().execute().actionGet().getState().getMetaData()
			.concreteAllIndices();
Collections.addAll(listOfIndices, allIndices);

Map<String, List<String>> indexWithTypes = new HashMap<String, List<String>>();

for (String eachIndex : listOfIndices) {
		List<String> typeNames = new ArrayList<String>();
ImmutableOpenMap<String, MappingMetaData> indexMapping = client.admin().cluster().prepareState().execute()
				.actionGet().getState().getMetaData().index(eachIndex).getMappings();
		Iterator<String> indexMappingIterator = indexMapping.keysIt();

while (indexMappingIterator.hasNext()) {
			String mappingsResponseKey = indexMappingIterator.next();
			String typeName = indexMapping.get(mappingsResponseKey).type();
			typeNames.add(typeName);
		}
indexWithTypes.put(eachIndex, typeNames);
return indexWithTypes;

Iterate through this using the cascading API to get data from ES to store it in corresponding folders in Hadoop:

String indexName = indexWithType.getKey();
List<String> indexTypes = indexWithType.getValue();

for (String indexType : indexTypes) {
		String hdfsPath = hdfsDir + "/" + indexName  + "/" + indexType;

The Tap functions are:

Tap esInTap = new EsTap(indexName + "/" + indexType, Fields.ALL);

Tap hdfsOutTap = new Hfs(new TextLine(new Fields("line")), hdfsPath, SinkMode.UPDATE);

(Costin Leau) #6

Why don't you use an alias instead? You can run a cron job that every X hours/days takes your interval and updates the alias. Then your job would simply point to it - it will be only the alias that would change and your cascading job would remain the same.
From your code I can't see how your specific 'interval' is but ES-Hadoop can read from multiple or even all indices.


(Manju) #7

Found this at the Elasticsearch Hadoop Cascading documentation:

My requirement is exactly that, runtime resolution of index names and types. That is why I have implemented the Transport client to retrieve the index names and types and then used them in the cascading workflow to save data into HDFS.

I am not sure how alias would help me, because although the index name prefixes remain the same, the types in each index are different. Moreover, I need to store them in correspondingly named folders in HDFS.

For example, if index-2015.12.12 has type1 and type2. The folder structure in HDFS should be hfdspath/index-2015.12.12/type1/docs, hfdspath/index-2015.12.12/type2/docs.

However if index-2015.12.13 has type1, type2 and type3, then they should be saved under hfdspath/index-2015.12.13/type1/docs, hfdspath/index-2015.12.13/type2/docs and hfdspath/index-2015.12.13/type3/docs


(system) #8