Creating Arrays of Nested Objects using the Spark Connector


(Mike) #1

I have a Spark DataFrame of the schema:

 |-- ROW_ID: string (nullable = true)
 |-- SUBJECT_ID: string (nullable = true)
 |-- HADM_ID: string (nullable = true)
 |-- CHARTDATE: string (nullable = true)
 |-- CHARTTIME: string (nullable = true)
 |-- STORETIME: string (nullable = true)
 |-- CATEGORY: string (nullable = true)
 |-- DESCRIPTION: string (nullable = true)
 |-- CGID: string (nullable = true)
 |-- ISERROR: string (nullable = true)
 |-- TEXT: string (nullable = true)
 |-- annotations: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- subject: string (nullable = true)
 |    |    |-- polarity: integer (nullable = false)
 |    |    |-- confidence: float (nullable = false)
 |    |    |-- historyOf: integer (nullable = false)
 |    |    |-- ontologyMappings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- preferredText: string (nullable = true)
 |    |    |    |    |-- codingScheme: string (nullable = true)
 |    |    |    |    |-- code: string (nullable = true)
 |    |    |    |    |-- cui: string (nullable = true)
 |    |    |    |    |-- tui: string (nullable = true)

I am indexing this entire structure in ElasticSearch, but neither the annotations field (Array of StructTypes), nor the ontologyMappings field are showing up as nested schemas. For example, the ontologyMappings mapping is shown below:

"ontologyMappings": {
                "properties": {
                  "code": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "codingScheme": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "cui": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "preferredText": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "code": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },

Is there a way to force these to be written as nested types instead of just being objects with property fields? I would like to be able to run queries that find documents that contain an instance where code is a particular string and associated polarity is 1 (under ontologyMappings). Without nesting, this association is impossible.


(James Baiera) #2

ES-Hadoop depends on the the built in Elasticsearch automatic field mapping. By default, Elasticsearch will not create nested fields when given an object. To get around this, we suggest that users set up index and mapping templates with Elasticsearch that will be picked up at index creation time.


(Mike) #3

Thanks for the reply James. We ended up doing exactly that and seeding the indices we generate with the proper nested schema (eg. making the index with the specified nested fields and letting upsert handle the non-nested ones).

What the mapping seed looks like:

"""
{
   "mappings":{
      "data":{
         "properties":{
            "emberNLP":{
               "type":"nested",
               "properties":{
                  "ontologyMappings":{
                     "type":"nested",
                     "properties":{
                        "code":{
                           "type":"text",
                           "fields":{
                              "keyword":{
                                 "type":"keyword"
                              }
                           }
                        },
                        "codingScheme":{
                           "type":"text",
                           "fields":{
                              "keyword":{
                                 "type":"keyword"
                              }
                           }
                        },
                        "cui":{
                           "type":"text",
                           "fields":{
                              "keyword":{
                                 "type":"keyword"
                              }
                           }
                        },
                        "preferredText":{
                           "type":"text",
                           "fields":{
                              "keyword":{
                                 "type":"keyword"
                              }
                           }
                        },
                        "tui":{
                           "type":"text",
                           "fields":{
                              "keyword":{
                                 "type":"keyword"
                              }
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}
    """

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.