Exception: Failed to execute phase

(Piyush Goyal) #1

Hi Costin,

Just wanted to check on the feasibility of actually using es-hadoop(spark) and elasticsearch to update a document count of 80 million spread among 32 shards. With the basic default configuration as per documentation, a 6 six executor spark cluster tried to update documents with the mentioned count eventually ran into the following error:

Lost task 20.1 in stage 1.0 (TID 32, ip-10-0-2-240.ec2.internal): org.apache.spark.util.TaskCompletionListenerException: SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:72)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Also, the spark job when started was giving me a throughput of 3 lakhs records per minute which eventually after 4 hours came down to 2 lakhs records per minute and eventually it died.

First question is it actually recommended to update such a huge count in elasticsearch using es-hadoop connector? If yes, then what's that I am missing and what's the best way to do that. If not what else is the recommendation from elasticsearch

(Costin Leau) #2

The stacktrace is not really relevant since it doesn't provide any information about Elasticsearch itself.
How many physical machines there are, what are the specs and most importantly what's the cluster state? There are plenty of monitoring tools - Marvel from Elastic and others; free to download and use.
The numbers need to be put in some context to get the right perspective.

Also how are you doing the update? Are you using scripts by any chance? Etc...

This is more of a question of how to get performance out of ES not so much about how the connector works.

P.S. 3 lakhs? what does that mean?

(system) #3