Elasticsearch SQL and Spark RDD/SQL

xzhou · November 27, 2018, 5:09pm

Hi,

Since Elasticsearch supports SQL (JDBC) interface in v6.5, could Spark (RDD or SQL) read data via JDBC? if it is feasible, how to deal with security on both sides (Elasticsearch, Spark job)? the env is currently configured to use user/password over SSL for both logstash and kibana.

Thanks in advance!

xzhou

costin · December 12, 2018, 8:25pm

Hi,

the JDBC driver could be used by any JVM consumer including Spark RDD/SQL. Since the driver is for Elasticsearch, it takes care only of its security details, in particular how to securely communicate with Elasticsearch (more information at https://www.elastic.co/guide/en/elasticsearch/reference/master/sql-jdbc.html).
The Spark side would have to be taken into account inside the Spark job accordingly.

Do note that the JDBC driver doesn't have the parallelization of the Es-Spark/Hadoop integration - otoh it does provide a much richer and powerful SQL capabilities.

Hth,

xzhou · December 12, 2018, 8:58pm

Hi Costin,

is there a way of retrieve ALL data from an index and bypassing the defaults ( page.size (default 1000)) ? I guess Spark will not take care of pagination in this case, and the total size is unknown to a job too.

Thanks, xzhou

system · January 9, 2019, 8:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can use 'org.elasticsearch.spark.sql' with kerberos to read es data Elasticsearch es-hadoop	1	507	March 8, 2021
Elasticsearch 6.2.3 and JDBC SQL driver for MySQL Elasticsearch	2	1155	April 18, 2018
Databricks Spark SQL and Elasticsearch Elasticsearch es-hadoop	5	504	April 19, 2023
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1570	March 2, 2017
When will Elasticsearch SQL be released? Elasticsearch	5	2021	August 25, 2017

Elasticsearch SQL and Spark RDD/SQL

Related topics