Hello everyone, I am using elastcisearch version 6.0.1 from AWS service. I am trying to read Es index data into spark dataframe using pyspark.
I can read all fields except the fields contain nested arrays.
The nested array contains another nested array in it.
Mappings in index is below:
{
"accounts": {
"type": "nested",
"properties": {
"accountClassificationOne": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"alternateNames": {
"type": "nested",
"properties": {
"createDate": {
"properties": {
"chronology": {
"type": "object"
},
"millis": {
"type": "long"
}
}
},
"inactive": {
"type": "boolean"
},
"name": {
"type": "text",
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "customer_synonym_autocomplete",
"search_analyzer": "customer_synonym"
},
"de": {
"type": "text",
"analyzer": "customer_german_autocomplete",
"search_analyzer": "german"
},
"full": {
"type": "text",
"analyzer": "customer_synonym_full"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"normalize": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
},
"analyzer": "customer_synonym"
}
},
"badDebt": {
"type": "boolean"
}
}
}
}
}
config in pyspark code trying to read is:
es_options_read = {
"es.nodes": es_nodes,
"es.port": 443,
"es.resource": "index_name/type",
"es.query": myquery,
"es.nodes.wan.only": "true",
"es.read.field.as.array.include": "accounts",
"es.read.field.include": "accounts"
}
Error is:
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'updateDate.chronology' not found; typically this occurs with arrays which are not mapped as single value
another error sometimes: java.lang.NullPointerException.
i tried multiple combination in arrays.include, struct field, explode and many in read options but no luck. could anyone help me on this.