Writing to new index using Spark with mapping disabled

Hi all,

I am using AWS EMR PySpark to write to AWS ElasticSearch (cluster version 6.3) using the elasticsearch-hadoop-7.5.2.jar from https://www.elastic.co/downloads/hadoop.

I am trying to write to a new index with mapping disabled (as mentioned here: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/enabled.html).

Is this possible? None of the configuration (https://www.elastic.co/guide/en/elasticsearch/hadoop/6.3/configuration.html) seems to allow this.

Currently, I have to create the index first with mapping disabled, and then write to that index. However, it would be awesome to just automatically create the index with mapping disabled as part of the Spark write.

Looking forward to discuss,

You could try defining your disabled mapping in an index template, which would be applied to the indices created by ES-Hadoop. I believe that shouldn't cause any problems with writing the data from Hadoop. If you encounter any issues with the connector rejecting that configuration for writing, please feel free to share them here and I can take a look at them.

As a cautionary bit of advice, you might encounter some issues if you plan to read the data back in with ES-Hadoop from an index without mappings. ES-Hadoop uses the mappings on the index to inform it how to deserialize fields, so you may get different types on the data than when it was originally written. Additionally, you would need to set es.read.unmapped.fields.ignore to false or else the connector will throw out fields that are not mapped in ES.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.