Just wanted to check on the feasibility of actually using es-hadoop(spark) and elasticsearch to update a document count of 80 million spread among 32 shards. With the basic default configuration as per documentation, a 6 six executor spark cluster tried to update documents with the mentioned count eventually ran into the following error:
Lost task 20.1 in stage 1.0 (TID 32, ip-10-0-2-240.ec2.internal): org.apache.spark.util.TaskCompletionListenerException: SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed]
Also, the spark job when started was giving me a throughput of 3 lakhs records per minute which eventually after 4 hours came down to 2 lakhs records per minute and eventually it died.
First question is it actually recommended to update such a huge count in elasticsearch using es-hadoop connector? If yes, then what's that I am missing and what's the best way to do that. If not what else is the recommendation from elasticsearch