Hi,
I am attempting to get started with the spark elasticsearch connector, and I notice that my SQL query never gets translated to ES search query with pushdown enabled.
Could you let me know what is wrong in the following set of steps? I registered the dataFrame through Spark's DataSource, but still dont see the query getting executed on ElasticSearch.
SparkConf conf = new SparkConf().setAppName("Simple Application"); Map<String,String> dataFrameOptions = new HashMap<String,String>(); dataFrameOptions.put("es.resource", "myindex/account"); dataFrameOptions.put("es.nodes","192.168.224.94"); dataFrameOptions.put("es.port","9200"); dataFrameOptions.put("es.index.auto.create","no"); dataFrameOptions.put("es.nodes.discovery","false"); dataFrameOptions.put("pushdown","true"); dataFrameOptions.put("double.filtering","false"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); DataFrame myEsDump = sqlContext.read().format("org.elasticsearch.spark.sql").options(dataFrameOptions).load("myindex/account"); myEsDump.registerTempTable("allAccounts"); DataFrame accounts = sqlContext.sql("SELECT name FROM allAccounts WHERE name = 'Name-888'");
Here are the versions that I am using
<dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-spark_2.10</artifactId> <version>2.2.0-rc1</version> </dependency>
Also, in the logs I see something as below, however, I dont see the query being run on ElasticSearch ( I have search/fetch slow logs enabled for 0s)
16/02/01 11:49:23 DEBUG DataSource: Pushing down filters [EqualTo(name,Name-888)] 16/02/01 11:49:23 TRACE DataSource: Transformed filters into DSL $filterString