I have done some tests and I don't get into this issue while filtering on a flat dataframe structure. My problem is that I am building my documents into ES joining multiple datasets. My structure is like this :
myObject
|----- mySubObject1
|--------------------- field1
|--------------------- field2
|----- mySubObject2
|--------------------- field3
so I need to do :
val test = sqlContext.read.format("es").load("myIndex/myDocs").filter($"mySubOject1.field1".equalTo("value"))
First off, try using ES-Hadoop 2.3.0 or 2.2.1.
Regarding the pushdown, you can enable logging on the spark package (looks like you did) on TRACE level and see whether something shows up. Try to do a simple test (select A from X where B > O) and you should see the query being translated.
Note that pushdown happens if Spark triggers it. I'm not clear what you mean by building "documents into ES joining".
I have already upgraded the connector in 2.3 without better results. I said I am building my documents joining 2 documents : I build myObject with a join between mySubOject1 and mySubObject2 so that's why my data structure isn't flat.
I will have to flatten my schema which requires an additional processing step in my spark job...
The pushdown applies only to documents that are stored in ES. If your documents are joined (which by the way, is an operation not pushed by Spark) it means they exist in Spark, hence there's nothing really to be pushed down.
Of course my docs are in ES... Anyway the problem is with non flat data structure : sub documents in documents (object.subObject in Spark SQL), the filter isn't pushdown.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.