How to write Parent-child relationship and has_child queries in Spark?

yuCompHW · March 11, 2016, 8:45am

Hello

I am a beginner in Elasticsearch-Spark.
My data is created in a Spark task, and it will be stored into Elasticsearch.
My problem is that, I have two types, namely "branch" and "employee", which are in parent-child relationship.
Now I cannot find the way to define their parent-child relationship when transferring data to Elasticsearch from Spark.

In detail, I have the mapping below defined in my Elasticsearch:
curl -XPUT 'localhost:9200/company?pretty=1' -d'
{
"mappings": {
"branch": {},
"employee": {
"_parent": {
"type": "branch"
}
}
}
}'

Then, I put several documents into type branch:
curl -XPOST 'localhost:9200/company/branch/_bulk?pretty' -d '
{ "index": { "_id": "london" }}
{ "name": "London Westminster", "city": "London", "country": "UK" }
{ "index": { "_id": "liverpool" }}
{ "name": "Liverpool Central", "city": "Liverpool", "country": "UK" }
{ "index": { "_id": "paris" }}
{ "name": "Champs élysées", "city": "Paris", "country": "France" }

Next, I insert a document of type employee into the index company,
curl -XPUT 'localhost:9200/company/employee/1?parent=london' -d'
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking"
}'

Then, inside Spark-Shell
scala> val options = Map("pushdown" -> "true", "es.read.metadata" -> "true")
scala> val esDf = sqlContext.read.format("org.elasticsearch.spark.sql").options(options).load("company/employee")
scala> esDf.printSchema
root
|-- dob: timestamp (nullable = true)
|-- hobby: string (nullable = true)
|-- name: string (nullable = true)
|-- _metadata: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)

NO PARENT information shown in the schema.
Therefore, I wonder how can I specify "parent=london" in Spark when using functions like saveToEs ?
Also, I wonder whether or not I can conduct has_child, has_parent queries to Elasticsearch through Spark? What are the functions or configurations ?

I use Elasticsearch 2.2.0, Spark 1.6.0 and elasticsearch-spark_2.10-2.2.0.jar

Many Thanks.

Topic		Replies	Views
Es-hadoop question 7.6.2 Elasticsearch es-hadoop	1	459	June 24, 2020
Elasticsearch nested objects and parent child relationship Elasticsearch es-hadoop	2	1386	July 6, 2017
Insert parent doc and child doc in One Spark job Elasticsearch es-hadoop	3	1213	July 6, 2017
Spark import es join type Elasticsearch es-hadoop	1	1380	May 17, 2019
Elasticsearch spark having issues indexing parent/child relationships Elasticsearch es-hadoop	2	891	July 6, 2017

How to write Parent-child relationship and has_child queries in Spark?

Related topics