Indexing JSON with nested fields


(Buntu Dev) #1

I've a nested json data with nested fields that I want to extract and construct a Scala Map.

Heres the sample JSON:

"nested_field": [
  {
    "airport": "sfo",
    "score": 1.0
  },
  {
    "airport": "phx",
    "score": 1.0
  },
  {
    "airport": "sjc",
    "score": 1.0
  }
]

I want to use saveToES() and construct a Scala Map to index the field into ES index with mapping as below:

 "nested_field": {
    "properties": {
      "score": {
        "type": "double"
      },
      "airport": {
        "type": "keyword",
        "ignore_above": 1024
      }
    }
  }

The json file is read into the dataframe using spark.read.json("example.json"). Whats the right way to construct the Scala Map in this case?

Thanks for any help!


(James Baiera) #2

This seems more like a generic Spark question. If you're asking how to parse JSON using Scala, that can be done hundreds of different ways, all of which are reasonable solutions. That's generally not a helpful tip though so to give you a starting spot: ES-Hadoop uses the Jackson JSON libraries to parse JSON into objects and vice versa. I would peruse that library as a good place to start.

As for ensuring the mapping is what you want, I would suggest either creating the index before running your job with your desired mapping. If you don't want to precreate the index every time and would rather ES-Hadoop do that, you can create an index template in Elasticsearch that will assign the mapping you want to any index who's name matches its pattern.