Elastic Search - JavaEsSpark read is slow

sukumar · October 17, 2016, 3:27am

I am using spark 1.4.1 and trying to read from the elasticsearch , its taking 1600 ms but when I tried to retrieve the same query using SENSE ,its taking just 3 ms. Can anyone help me to improve the query performance ?

JavaPairRDD esResultPair = JavaEsSpark.esJsonRDD(ctx, "index/02",string);

ebuildy · October 17, 2016, 7:18am

Why you are using Spark ?

Usually latency in "big data" stuff is 1 seconds / 1 minute, so I imagine this is quite normal and is not elasticsearch related (run job, schedule resources etc...).

james.baiera · October 17, 2016, 3:57pm

Could you elaborate on what you're trying to read? Also could you elaborate on the query you are running in Sense?

sukumar · October 18, 2016, 12:15am

I replaced JavaESspark with JestClient and its working now. One request is taking 1round 120 ms. Thanks.

sukumar · October 18, 2016, 12:21am

I am running job using spark 1.4.1 and tried to replace the existing job with spark and elastic search, writing to elastic search using spark JavaESSpark is really fast but read is not as expected. Please find the simple query I ran in spark and SENSE.

"query" : {
"bool" : {
"must" : [ {
"match" : {
"full_name" : {
"query" : "JACQUELINE"
}
}
}, {
"match" : {
"full_name" : {
"query" : "COLWILL"
}
}
}, {
"bool" : {
"should" : [ {
"match" : {
"acct" : {
"query" : 0
}
}
}, {
"match" : {
"ids" : {
"query" : ""
}
}
} ]
}
} ]
}
}
}

ebuildy · October 18, 2016, 10:16am

How did you replace like this ? I am very curious to see the code source and how you submit your Spark job.

james.baiera · October 18, 2016, 3:49pm

Your query looks very specific, and thus will probably only retrieve a handful of results. It's important to note the difference between Sense and Spark. Sense is a GUI Client for Elasticsearch queries. Sense will only return the top ten results that match your query. It takes advantage of Elasticsearch's search features to do this incredibly fast (on the order of milliseconds). Spark is meant for heavy duty data processing. When using EsSpark, it targets a different search type which streams all of the data out of Elasticsearch for analysis in Spark. This tends to be a heavier request mechanic (operating over the course of multiple seconds).

If you are using this same query in Spark for reading, Spark will end up wasting a lot of time standing up multiple tasks to read the data from Elasticsearch, only get a few records, and then go through a costly job teardown process. EsSpark is not meant to be a fast client for retrieving very few records. It is meant to be a connector for data processing at scale. This specific of a query would probably be served better if it were executed from a regular application client.

sukumar · October 18, 2016, 6:29pm

Thanks for the clarification. Let me try with the large query for the large data set.

Topic		Replies	Views
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1535	July 29, 2021
Spark ES connector is taking time Elasticsearch	1	362	June 21, 2018
Performance Challenge Elasticsearch es-hadoop	6	1081	April 28, 2017
Reading from Elasticsearch index using spark ( es-hadoop ) connectors Elasticsearch es-hadoop	2	1405	March 22, 2022
Elasticsearch + Spark read performance issues Elasticsearch es-hadoop	3	2274	May 24, 2016

Elastic Search - JavaEsSpark read is slow

Related topics