based on discussion about ES use cases I was wondering whether there is any way how Spark could benefit from ES aggregations and convert them to Dataframe. F.e something like: val esConf = ... val esQuery = """{"agg" : {"my_agg" : {"terms" : {"field": "field_A"} } } }""" val jsonResult = client.search(esQuery, esConf) val transformer = ... val df = jsonResult.toDF(transformer) val result = df.filter(...).join(otherDf) ....
A) Is there any plan to support something similar in ES roadmap?
B) As I understood correctly how spark-es/hadoop-es works library based on scroll's json results "detects" dataframe's schema. Can you direct me to classes which is responsible for this detection? I was wondering whether these components could be used for building 'transformer' I had in my example.
A) Aggregations are not currently supported by ES-Hadoop; it's the next major item on the roadmap.
B) All the spark SQL classes reside under their dedicated package:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.