Elastic Search Hadoop: Pinging data to ES from Hive Fails


(ravimbhatt) #1

Hi All,

I am trying to send data from Hive to ES. My job keeps on getting
connection timeouts. On the ES side, i do not see any errors though.

2014-04-29 21:11:08,783 INFO
org.apache.commons.httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
timed out: connect

2014-04-29 21:11:08,783 INFO
org.apache.commons.httpclient.HttpMethodDirector: Retrying request

2014-04-29 21:11:29,807 INFO
org.apache.commons.httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
timed out: connect

2014-04-29 21:11:29,807 INFO
org.apache.commons.httpclient.HttpMethodDirector: Retrying request

......

org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col1":7225,"col2":27041,"col3":0.93}

    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:673)

    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)

    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:415)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233)

    at org.apache.hadoop.mapred.Child.main(Child.java:260)

Caused by: org.elasticsearch.hadoop.rest.EsHadoopProtocolException: Connection error (check network and/or proxy settings) - out of nodes and retries

    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:96)

    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:275)

    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:267)

    at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:326)

    at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:268)

    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:210)

    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:199)

    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)

    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)

    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)

    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)

    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:654)

    ... 9 more

Caused by: java.net.ConnectException: Connection timed out: connect

    at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:157)

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)

    at java.net.Socket.connect(Socket.java:579)

    at java.net.Socket.connect(Socket.java:528)

    at java.net.Socket.<init>(Socket.java:425)

    at java.net.Socket.<init>(Socket.java:280)

    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)

    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)

    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)

    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)

    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)

    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)

    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)

    at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:298)

    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:80)

    ... 26 more

I have tried different batch sizes and the result has been the same. ES machines at times become CPU bound but memory never comes under strain. I have tried tweaking below settings:

'es.batch.size.entries'='1000',

'es.http.timeout'='10m',

'es.batch.write.refresh'='false',

'es.action.heart.beat.lead'='60s'

Can this be a ES issue where it cannot keepup with incoming data load?

Thanks!

Ravi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32542a4c-dc62-442b-a7aa-65d4976c62ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2