SparkSQL Index Mapping, Partition issues


(Tridib) #1

I am saving a DataFrame into ElasticSearch. The resource look like as follows: {partitionField}/{docType}
JavaEsSparkSQL.saveToEs(dataFrame, "{partitionField}/{docType}")

It creates one index for each partition value. There are few issues I am facing:

  1. if the partitonField has upper case value elastic search throws exception saying index name can't have upper case.
  2. there are nested documents which needs to be mapped as "nested" type. because the Index is created at runtime based on data I am unable to create it before hand with proper mapping. Is there a way to pass the Index specification while saving data to Elastic search?

Thanks & Regards
Tridib


(Costin Leau) #2
  1. if the partitonField has upper case value elastic search throws exception saying index name can't have upper case.

That's because Elasticsearch itself does not allow upper case indices. So ES-Hadoop validates the input before deploying the job (only to have it abort) and thus save time.

Is there a way to pass the Index specification while saving data to Elastic search?

No. This is done on purpose since there's no reliable way across all libraries to load the mapping and save it to Elasticsearch. Also, in practice it would require a JSON mapping and since it's a one time step, it provides no real benefit over the doing things manually especially when using templates (which achieve the same thing directly in ES) as explained here.


(system) #3