Hi I'm new to elasticsearch and right now writing a spark dataframe to elasticsearch. The dataframe has below schema:
df.printSchema()
root
|-- col1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- col11: string (nullable = true)
| | |-- col12: struct (nullable = true)
| | | |-- col121: string (nullable = true)
| | |-- col13: struct (nullable = true)
| | | |-- col131: string (nullable = true)
| | |-- col14: string (nullable = true)
| | |-- col15: struct (nullable = true)
| | | |-- col151: struct (nullable = true)
| | | | |-- col1511: string (nullable = true)
| | | | |-- col1512: long (nullable = true)
| | | | |-- col1513: long (nullable = true)
| | | |-- col152: struct (nullable = true)
| | | | |-- col1521: string (nullable = true)
| | | | |-- col1522: long (nullable = true)
| | | | |-- col1523: long (nullable = true)
| | |-- col16: string (nullable = true)
|-- my_id: string (nullable = true)
field col1511 in above dataframe is always empty for all records.
I'm using below command inside pyspark to write this dataframe to local elasticsearch:
df.write.format('org.elasticsearch.spark.sql').option('es.nodes', 'localhost').option('es.port', '9200').option('es.resource', 'index_name').option('es.mapping.id', 'my_id').option('es.write.operation', 'index').save(mode='append')
I can see all the fields appears in elasticsearch index except field col1511. Even it's always empty, I still need to keep it inside elasticsearch (as null for example). Is there a config I can make it happen or any suggestions?
I'm running with pyspark 2.4 (python 3.7) and elasticsearch 7.5.1. I found es.field.read.empty.as.null
, and based on document, it's defaulted as yes
already, and I tried to manually set it to yes as well, but still doesn't work.
Thanks in advance.