[hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES

Hi,
I am trying to configure a system to use both Basic Authentication and
HTTPS to Store data to ElasticSearch.

My system is configured with a Pig script running through Hadoop to connect
to Apache (configured as a proxy) to forward the request to ElasticSearch.
Using simple HTTP and Basic Authentication works correctly. However, when I
try to force my ES UDF to use HTTPS, I get errors in my Apache logs and my
job fails.

The relevant snippet of my Pig script is below:
REGISTER
/bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar

DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes=https://127.0.0.1:28443',
'es.net.proxy.http.host=https://127.0.0.1',
'es.net.proxy.http.port=28443',
'es.net.proxy.http.user=myuser',
'es.net.proxy.http.pass=mypass',
'es.http.retries=10');

data = LOAD... ...

STORE data INTO 'my_data_index/data' USING EsStorage;

The error output to the Apache log is as follows:
SSL Library Error: error:1407609B:SSL
routines:SSL23_GET_CLIENT_HELLO:https proxy request -- speaking HTTP to
HTTPS port!?

The error/stacktrace from my Map job is as follows:
Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException:
Connection error (check network and/or proxy settings)- all nodes failed;

tried [[https://127.0.0.1:28443]] at
*org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123)
at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at *
*org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at *
*org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at *
*org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99)
at *
*org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59)
at *
*org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180)
at *
*org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157)
at *
*org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at *
*org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at *
*org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at *
*org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at *
*org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at *
*org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at *
*org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at *
*org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at *
*org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at *
*java.security.AccessController.doPrivileged(Native Method) at *
*javax.security.auth.Subject.doAs(Subject.java:415) at *
*org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at *
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

So my question is, is this possible (i.e. can it work)? And if so, where am
I going wrong?

Thanks in advance for any help.

Aidan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

That's because currently, es-hadoop does not support SSL (and thus HTTPS). There are plans to make this happen in 2.1
but we are not there yet.
In the meantime I suggest trying to use either an HTTP proxy or an HTTP-to-HTTPS proxy.

Cheers,

On 10/22/14 7:11 PM, Aidan Higgins wrote:

Hi,
I am trying to configure a system to use both Basic Authentication and HTTPS to Store data to Elasticsearch.

My system is configured with a Pig script running through Hadoop to connect to Apache (configured as a proxy) to forward
the request to Elasticsearch. Using simple HTTP and Basic Authentication works correctly. However, when I try to force
my ES UDF to use HTTPS, I get errors in my Apache logs and my job fails.

The relevant snippet of my Pig script is below:
/REGISTER /bigdata/cloudera/ES_HadoopJar/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-2.0.2.jar/
/
/
/DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage(/
/'es.nodes=https://127.0.0.1:28443',/
/'es.net.proxy.http.host=https://127.0.0.1',/
/'es.net.proxy.http.port=28443',/
/'es.net.proxy.http.user=myuser',/
/'es.net.proxy.http.pass=mypass',/
/'es.http.retries=10');/
/
/
/
/
/data = LOAD... .../
/
/
/STORE data INTO 'my_data_index/data' USING EsStorage;/

The error output to the Apache log is as follows:
SSL Library Error: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:https proxy request -- speaking HTTP to HTTPS port!?

The error/stacktrace from my Map job is as follows:
/Error: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy
settings)- all nodes failed;/
/tried [[https://127.0.0.1:28443]] at/
/org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at /
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:300) at /
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:284) at /
/org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:288) at /
/org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:117) at /
/org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:99) at /
/org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:59) at /
/org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:180) at /
/org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:157) at /
/org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:196) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at /
/org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) at /
/org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at /
/org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at /
/org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at /
/org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at /
/org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at /
/org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at /
/org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at /
/java.security.AccessController.doPrivileged(Native Method) at /
/javax.security.auth.Subject.doAs(Subject.java:415) at /
/org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at /
/org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)/

So my question is, is this possible (i.e. can it work)? And if so, where am I going wrong?

Thanks in advance for any help.

Aidan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7d19a21e-2947-4ac4-8e3e-68cc8c25185b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54481F22.2080602%40gmail.com.
For more options, visit https://groups.google.com/d/optout.