Spark SQL : How to specify a date range in WHERE clause?

Preeti_Raj_Buchhada · April 13, 2016, 11:28am

Hello,

I've got some success working with Spark SQL CLI to access our ES data.
Environment:
ES: 1.3.2
es-hadoop: elasticsearch-hadoop-2.2.0
Spark: spark-1.4.1-bin-hadoop2.6

The pushdown feature is working great!

However, I'm stuck with how to specify a date range in the WHERE clause, so that it gets pushed down to ES?

I've tried:

SELECT count(member_id),member_id 
FROM data 
WHERE ( response_timestamp > CAST('2015-03-01' AS date) AND response_timestamp < CAST('2015-03-31' AS date) ) GROUP BY member_id ;

but it doesn't push down the range filter to ES.

Is this possible with the current es-hadoop version in the first place?
Please help!

costin · April 13, 2016, 11:54am

ES-Hadoop implements all the filters/pushdown hooks available in Spark SQL. If Spark SQL doesn't push the operation, ES-Hadoop has no chance of doing the translation.
Likely in your case, the CAST is the one to blame.
Further more, I would recommend upgrading the Spark 1.6.1 as it has significant new features in Spark SQL.

cj_usermind · March 9, 2017, 9:22pm

I'm having the same issue. In the original example, I don't see how the CAST would be to blame, as the CAST should be executed once per constant string, the string converted to a date and the constant date pushed down into ElasticSearch.

So, is there a hook that is needed? Is there a better way to formulate the query so that it does pushdown the (very simple) date filter into ES.

I have an index with millions of rows. I want to filter on date. But if the date filter is not pushed down, the millions of rows are loaded into Spark before the filter occurs, which means the query takes foreever. Running a date filter in Kibana or in Elastic DSL doesn't take nearly as long.

james.baiera · March 12, 2017, 8:29pm

The best route would be to examine how the Catalyst optimizers react to the given query above. All hooks that are available are implemented at this time, and Date is definitely included in them.

cj_usermind · March 13, 2017, 2:43pm

Dear James,Thanks for your response.
I'm not a developer of the elasticsearch-hadoop project , so I don't know how to
begin to evaluate the catalyst optimizers.
I'm very surprised that this very simple, innocent query below doesn't push down
into ES, nor has anyone found or suggested a reworking of the query to make the
pushdown work.

SELECT * FROM table1 WHERE createdAt=DATE(’3/6/2017’);
My instance of table1 has 20 million records, and the query processor loads the
20 million records into Spark in order to do a simple date filter.

Topic		Replies	Views
Spark sql pushdown conjunction filters Elasticsearch es-hadoop	10	2761	July 6, 2017
Spark push-down query enhancement Elasticsearch es-hadoop	4	2119	July 6, 2017
[ElasticSearch-Spark] [SQL] Passing pushed down filters to elastic search Elasticsearch	1	388	July 6, 2017
Slow performance of Elasticsearch-Hadoop + Spark SQL Elasticsearch es-hadoop	11	6192	July 6, 2017
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1615	March 2, 2017

Spark SQL : How to specify a date range in WHERE clause?

Related topics