I am using spark 1.4.1 and trying to read from the elasticsearch , its taking 1600 ms but when I tried to retrieve the same query using SENSE ,its taking just 3 ms. Can anyone help me to improve the query performance ?
Usually latency in "big data" stuff is 1 seconds / 1 minute, so I imagine this is quite normal and is not elasticsearch related (run job, schedule resources etc...).
I am running job using spark 1.4.1 and tried to replace the existing job with spark and elastic search, writing to elastic search using spark JavaESSpark is really fast but read is not as expected. Please find the simple query I ran in spark and SENSE.
Your query looks very specific, and thus will probably only retrieve a handful of results. It's important to note the difference between Sense and Spark. Sense is a GUI Client for Elasticsearch queries. Sense will only return the top ten results that match your query. It takes advantage of Elasticsearch's search features to do this incredibly fast (on the order of milliseconds). Spark is meant for heavy duty data processing. When using EsSpark, it targets a different search type which streams all of the data out of Elasticsearch for analysis in Spark. This tends to be a heavier request mechanic (operating over the course of multiple seconds).
If you are using this same query in Spark for reading, Spark will end up wasting a lot of time standing up multiple tasks to read the data from Elasticsearch, only get a few records, and then go through a costly job teardown process. EsSpark is not meant to be a fast client for retrieving very few records. It is meant to be a connector for data processing at scale. This specific of a query would probably be served better if it were executed from a regular application client.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.