I have a working example using the Elasticsearch/Hive SQL connector. It allows me to use Tableau to query Elasticsearch. Here is my working Hive SQL DDL:
CREATE EXTERNAL TABLE biblio_raw (
url STRING,
content_text_array ARRAY<STRING>
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.port' = '11100',
'es.resource' = 'my_index/doc',
'es.mapping.names' = 'content_text_array:contentText'
);
The Hive queries are painfully slow. I now want to create a similar view using Spark so that I can take advantage of the elasticsearch / Spark push-down capabilities.
I start the Spark thrift server as follows:
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://xxx.com:11407 --hiveconf hive.server2.thrift.port=11410 --jars ~/workspace/lib/elasticsearch-spark_2.10-2.2.0-rc1.jar
It is not clear to me what the equivalent Spark SQL DDL to create the view above. I tried the following without success:
CREATE EXTERNAL TABLE biblio_raw (
url STRING,
content_text_array ARRAY<STRING>
)
STORED BY 'org.elasticsearch.spark.sql'
TBLPROPERTIES(
'es.port' = '11100',
'es.resource' = 'my_index/doc',
'es.mapping.names' = 'content_text_array:contentText'
);
It failed with the error: Cannot find class 'org.elasticsearch.spark.sql'