I'm trying to index data in elasticsearch about 77M documents with 150 fields .
We dont have much computing/memory resource so our cluster is 3 nodes ( 48GB RAM /24 CPU and 6TB of storage )
I'm sending data from another spark cluster in another virtual network but the two networks are paired and I CAN PING ALL the els nodes from the spark cluster nodes .
the problem that I'm facing : is that at a certain amount of documents indexed ( about 8M ) spark cannot connect to els and it throws the following error :
Job aborted due to stage failure: Task 173 in stage 9.0 failed 4 times, most recent failure: Lost task 173.3 in stage 9.0 (TID 17160, wn21-swspar.of12wietsveu3a3voc5bflf1pa.ax.internal.cloudapp.net, executor 3): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.0.0.12:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:466)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:450)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:186)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:248)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:270)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:210)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:187)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I don't know what can cause this. Is the cluster size ( RAM/CPU) not enougth or is there a special configuration for indexes with huge amount of data ?
what I'm sure about is that it's not a network problem .
ELS version : 6.2.4
thank you .