Hi All,
I am trying to send data from Hive to ES. My job keeps on getting
connection timeouts. On the ES side, i do not see any errors though.
2014-04-29 21:11:08,783 INFO
org.apache.commons.httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
timed out: connect
2014-04-29 21:11:08,783 INFO
org.apache.commons.httpclient.HttpMethodDirector: Retrying request
2014-04-29 21:11:29,807 INFO
org.apache.commons.httpclient.HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Connection
timed out: connect
2014-04-29 21:11:29,807 INFO
org.apache.commons.httpclient.HttpMethodDirector: Retrying request
......
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col1":7225,"col2":27041,"col3":0.93}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:673)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopProtocolException: Connection error (check network and/or proxy settings) - out of nodes and retries
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:96)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:275)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:267)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:326)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:268)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.initSingleIndex(EsOutputFormat.java:210)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:199)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:654)
... 9 more
Caused by: java.net.ConnectException: Connection timed out: connect
at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:157)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:280)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:298)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:80)
... 26 more
I have tried different batch sizes and the result has been the same. ES machines at times become CPU bound but memory never comes under strain. I have tried tweaking below settings:
'es.batch.size.entries'='1000',
'es.http.timeout'='10m',
'es.batch.write.refresh'='false',
'es.action.heart.beat.lead'='60s'
Can this be a ES issue where it cannot keepup with incoming data load?
Thanks!
Ravi
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32542a4c-dc62-442b-a7aa-65d4976c62ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.