I am using AWS EMR PySpark to write to AWS ElasticSearch (cluster version 6.3) using the elasticsearch-hadoop-7.5.2.jar from https://www.elastic.co/downloads/hadoop.
Currently, I have to create the index first with mapping disabled, and then write to that index. However, it would be awesome to just automatically create the index with mapping disabled as part of the Spark write.
You could try defining your disabled mapping in an index template, which would be applied to the indices created by ES-Hadoop. I believe that shouldn't cause any problems with writing the data from Hadoop. If you encounter any issues with the connector rejecting that configuration for writing, please feel free to share them here and I can take a look at them.
As a cautionary bit of advice, you might encounter some issues if you plan to read the data back in with ES-Hadoop from an index without mappings. ES-Hadoop uses the mappings on the index to inform it how to deserialize fields, so you may get different types on the data than when it was originally written. Additionally, you would need to set es.read.unmapped.fields.ignore to false or else the connector will throw out fields that are not mapped in ES.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.