Spark SQL thriftserver DDL?

andrewwwooster · February 23, 2016, 2:52am

I have a working example using the Elasticsearch/Hive SQL connector. It allows me to use Tableau to query Elasticsearch. Here is my working Hive SQL DDL:

CREATE EXTERNAL TABLE biblio_raw (
    url   STRING,
   content_text_array ARRAY<STRING>
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
  'es.port' = '11100', 
  'es.resource' = 'my_index/doc', 
  'es.mapping.names' = 'content_text_array:contentText'
);

The Hive queries are painfully slow. I now want to create a similar view using Spark so that I can take advantage of the elasticsearch / Spark push-down capabilities.

I start the Spark thrift server as follows:

$SPARK_HOME/sbin/start-thriftserver.sh --master spark://xxx.com:11407 --hiveconf hive.server2.thrift.port=11410 --jars ~/workspace/lib/elasticsearch-spark_2.10-2.2.0-rc1.jar

It is not clear to me what the equivalent Spark SQL DDL to create the view above. I tried the following without success:

CREATE EXTERNAL TABLE biblio_raw (
    url   STRING,
    content_text_array ARRAY<STRING>
)
STORED BY 'org.elasticsearch.spark.sql'
TBLPROPERTIES(
  'es.port' = '11100', 
  'es.resource' = 'my_index/doc', 
  'es.mapping.names' = 'content_text_array:contentText'
);

It failed with the error: Cannot find class 'org.elasticsearch.spark.sql'

andrewwwooster · February 24, 2016, 5:27am

I've figured out the corresponding Spark SQL syntax. The following temporary table works:

CREATE TEMPORARY TABLE biblio_raw (
   url   STRING,
  content_text_array ARRAY<STRING>
)
USING org.elasticsearch.spark.sql 
OPTIONS (
  path 'my_index/doc', 
  query '?q=mouse',
  es.port '11100',
  es.mapping.names 'content_text_array:contentText'
);

HOWEVER, the es.mapping.names do NOT work. The above DDL does not map content_text_array SQL field to the contentText ES field. How do I express mappings with Elasticsearch/Spark SQL?

costin · March 3, 2016, 9:31am

es.mapping.names is not a global configuration - it is available in Hive (mainly because Hive has various issues including being case insensitive) while Spark does not. Use the upper case variable and you should be all set.

P.S. Note that the query param makes sense when reading; you can use it for writing however many find it confusing when they can't read back data that they write (since it does not match the query bound to the table).

Topic		Replies	Views
How to create an external Table on ES document uisng Spark SQL(ES-Hadoop 6.7.0 and ES 6.7.0) Elasticsearch es-hadoop	3	1237	September 12, 2019
Unable to create external table in elastic search using es-hadoop Elasticsearch es-hadoop	3	1495	February 27, 2017
Writting to Index using pure SQL Elasticsearch es-hadoop	1	964	February 1, 2017
Elasticsearch Spark - Tracing of Statement Scan And Scroll Elasticsearch es-hadoop	3	2726	July 6, 2017
Problem reading from Elasticsearch using Spark SQL Elasticsearch	2	806	July 6, 2017

Spark SQL thriftserver DDL?

Related topics