Spark-SQL: Ensure SQL query gets translated to ES query


(Robbie) #1

Hi,
I am attempting to get started with the spark elasticsearch connector, and I notice that my SQL query never gets translated to ES search query with pushdown enabled.

Could you let me know what is wrong in the following set of steps? I registered the dataFrame through Spark's DataSource, but still dont see the query getting executed on ElasticSearch.

SparkConf conf = new SparkConf().setAppName("Simple Application"); Map<String,String> dataFrameOptions = new HashMap<String,String>(); dataFrameOptions.put("es.resource", "myindex/account"); dataFrameOptions.put("es.nodes","192.168.224.94"); dataFrameOptions.put("es.port","9200"); dataFrameOptions.put("es.index.auto.create","no"); dataFrameOptions.put("es.nodes.discovery","false"); dataFrameOptions.put("pushdown","true"); dataFrameOptions.put("double.filtering","false"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); DataFrame myEsDump = sqlContext.read().format("org.elasticsearch.spark.sql").options(dataFrameOptions).load("myindex/account"); myEsDump.registerTempTable("allAccounts"); DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-888'");

Here are the versions that I am using

<dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-spark_2.10</artifactId> <version>2.2.0-rc1</version> </dependency>

Also, in the logs I see something as below, however, I dont see the query being run on ElasticSearch ( I have search/fetch slow logs enabled for 0s)

16/02/01 11:49:23 DEBUG DataSource: Pushing down filters [EqualTo(name,Name-888)] 16/02/01 11:49:23 TRACE DataSource: Transformed filters into DSL $filterString


(Robbie) #2

Turns out that it is issuing scan/scroll search commands. However, my index_search and index_fetch logging is not showing it, which led me to an incorrect presumption. Packet tracing revealed that the queries are being sent to ES.


(Costin Leau) #3

Fwiw, 2.2 GA actually fixed the logs messages and on TRACE/DEBUG mode one sees the actual query from both Spark and ES.


(system) #4