PySpark fails to read multiple nested levels of Elasticsearch index


#1

I'm trying to read document properties that are nested under 3 levels of entities and getting the familiar 'typically this occurs with arrays which are not mapped as single value' error.

The document I'm trying to read looks like this:

"_source": {
    "Actions": {
      "Agree": {
        "Total": 0,
        "Involved": false
      },
      "Disagree": {
        "Total": 0,
        "Involved": false
      },
      "Report": {
        "Total": 0,
        "Involved": false
      },
      "Comment": {
        "Total": 6,
        "Involved": false,
        "NextLoad": 0
      }
}

I'm setting the field-specific config this way:

conf.set('es.read.as.array.include', 'Actions, Actions.Agree, Actions.Disagree, Actions.Comment')
conf.set('es.read.field.include', 'Actions.Agree.Total, Actions.Disagree.Total, Actions.Comment.Total')

I've tried messing around with this config, either not setting the read.as.array at all or not setting the root 'Actions' entity, same result as below for all combinations.

Setting schema and initializing DF this way:

schema = StructType([ 
StructField("_id", StringType(), True),
StructField("Actions.Agree.Total", IntegerType(), True),
StructField("Actions.Disagree.Total", IntegerType(), True),
StructField("Actions.Comment.Total", IntegerType(), True)
])

df = sqlContext.read.format("org.elasticsearch.spark.sql").load('index/type')

Trying to fetch results gives this error:

org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'Actions.Comment' not found; typically this occurs with arrays which are not mapped as single value

Calling printSchema() on DF returns the following structure:

root
|-- Actions.Agree.Total: integer (nullable = true)
|-- Actions.Disagree.Total: integer (nullable = true)
|-- Actions.Comment.Total: integer (nullable = true)

Without manually setting schema on the DF the schema shows the following:

root
|-- Actions: struct (nullable = true)
| |-- Agree: struct (nullable = true)
| |-- Comment: struct (nullable = true)
| |-- Disagree: struct (nullable = true)

Spark version is 2.2.0

Please advise


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.