Loading JSON documents to elasticsearch via es-spark connector

If I have a spark dataframe full of JSON documents with the following schema, does the es-hadoop connector allow indexing them to elastic.

scala> df.printSchema
root
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- location: string (nullable = true)
| | |-- std_city: string (nullable = true)
| | |-- std_state: string (nullable = true)
| | |-- std_street_name: string (nullable = true)
| | |-- std_street_number: string (nullable = true)
| | |-- std_zip: string (nullable = true)
|-- date_of_birth: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- dob_day: string (nullable = true)
| | |-- dob_full: string (nullable = true)
| | |-- dob_month: string (nullable = true)
| | |-- dob_year: string (nullable = true)
|-- names: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- first_name: string (nullable = true)
| | |-- first_name_list: string (nullable = true)
| | |-- fn_formalname_assocs: string (nullable = true)
| | |-- fn_nickname_assocs: string (nullable = true)
| | |-- last_name: string (nullable = true)
| | |-- last_name_list: string (nullable = true)
| | |-- ln_formalname_assocs: string (nullable = true)
| | |-- ln_nickname_assocs: string (nullable = true)
| | |-- middle_name: string (nullable = true)
| | |-- middle_name_list: string (nullable = true)
| | |-- mn_formalname_assocs: string (nullable = true)
| | |-- mn_nickname_assocs: string (nullable = true)
|-- phone: array (nullable = true)
| |-- element: string (containsNull = true)
|-- pm_id: string (nullable = true)
|-- id: string (nullable = true)

I will have a mapping in elastic, to match the above. Also is there a way to specify the id field from the json document as the _id for elastic

This doesn't look like it would be hard to ingest with ES-Hadoop. If you haven't had a chance to take a look at our Spark documentation I recommend it.

is there a way to specify the id field from the json document as the _id for elastic

You're looking for the es.mapping.id setting for that: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#cfg-mapping

Thanks James, It did work fine. I have used the connector for a long time now and a big fan. Thanks for all the hard work.

Happy to hear! Cheers!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.