SparkSQL Index Mapping, Partition issues

tridib · October 30, 2015, 9:03pm

I am saving a DataFrame into ElasticSearch. The resource look like as follows: {partitionField}/{docType}
JavaEsSparkSQL.saveToEs(dataFrame, "{partitionField}/{docType}")

It creates one index for each partition value. There are few issues I am facing:

if the partitonField has upper case value elastic search throws exception saying index name can't have upper case.
there are nested documents which needs to be mapped as "nested" type. because the Index is created at runtime based on data I am unable to create it before hand with proper mapping. Is there a way to pass the Index specification while saving data to Elastic search?

Thanks & Regards
Tridib

costin · November 7, 2015, 9:16pm

if the partitonField has upper case value Elasticsearch throws exception saying index name can't have upper case.

That's because Elasticsearch itself does not allow upper case indices. So ES-Hadoop validates the input before deploying the job (only to have it abort) and thus save time.

Is there a way to pass the Index specification while saving data to Elastic search?

No. This is done on purpose since there's no reliable way across all libraries to load the mapping and save it to Elasticsearch. Also, in practice it would require a JSON mapping and since it's a one time step, it provides no real benefit over the doing things manually especially when using templates (which achieve the same thing directly in ES) as explained here.

Topic		Replies	Views
(apache spark df).saveToES(elastic search) Elasticsearch es-hadoop	3	2039	March 26, 2017
Found duplicate column(s) in the data schema, Need help on how to load such index data into Spark Dataframe Elasticsearch es-hadoop	2	14331	March 11, 2019
Elastic-Spark connector : How to read data fro ES Index which has nested Json with array fields Elasticsearch es-hadoop	2	519	July 20, 2022
Spark-ES saveToES without type Elasticsearch es-hadoop	2	2097	April 13, 2018
Spark Structured Streaming timeseries indices Elasticsearch es-hadoop	3	888	August 8, 2019

SparkSQL Index Mapping, Partition issues

Related topics