Reindexing Elasticsearch index with parent and child relationship


(Immo Benjes) #1

Hi

we currently have a 'message' that can have a link to a 'parent' message. E.g. a reply would have the original message as the parent_id.

    PUT {
  "mappings": {
    "message": {
      "properties": {
        "subject": {
          "type": "text"
         },
         "body" : {
            "type" : "text"
         },
         "parent_id" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

we currently have a 'message' that can have a link to a 'parent' message. E.g. a reply would have the original message as the parent_id.

PUT {
  "mappings": {
    "message": {
      "properties": {
        "subject": {
          "type": "text"
         },
         "body" : {
            "type" : "text"
         },
         "parent_id" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

Currently we didn't have an elasticsearch parent child join on the document as parent and child weren't allowed to be of the same type. Now with 5.6 and the drive by elastic to get rid of types we are now trying to use the new parent and child join in 5.6 which.

PUT {
  "settings": {
    "mapping.single_type": true
  },
  "mappings": {
    "message": {
      "properties": {
        "subject": {
          "type": "text"
         },
         "body" : {
            "type" : "text"
         },
         "join_field": {
            "type" : "join",
            "relations": {
                "parent_message":"child_message"
            }
        }
        }
      }
    }
  }
}

I know I will have to create a new index for this and then reindex everything with _reindex but I am not quite sure how I would do that.

If I index a parent_message it is simple

    PUT localhost:9200/testm1/message/1 
    {
            "subject": "Message 1",
             "body" : "body 1"
    }

For the child message I have to provide the routing to the parent:

PUT localhost:9200/testm1/message/3?routing=1
{
        "subject": "Message Reply to 1",
         "body" : "body 3",
          "join_field": {
            "name": "child_message",
            "parent": "1"
    }
 }

The index now has this data:

{
                "_index": "testm1",
                "_type": "message",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "subject": "Message 2",
                    "body": "body 2"
                }
            },
            {
                "_index": "testm1",
                "_type": "message",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "subject": "Message 1",
                    "body": "body 1"
                }
            },
            {
                "_index": "testm1",
                "_type": "message",
                "_id": "3",
                "_score": 1,
                "_routing": "1",
                "_source": {
                    "subject": "Message Reply to 1",
                    "body": "body 3",
                    "join_field": {
                        "name": "child_message",
                        "parent": "1"
                    }
                }
            }

How do I do the _reindex now. I believe I have to manually create the join_field and have to add _routing to child messages. I haven't used the scripting much yet so I am struggling to get the reindex working:
I've tried this:

{
	"source": {
		"index" : "testm"
	},
	"dest" :{
		"index" : "testmnew"
	},
	"script" : {
		"lang" : "painless",
		"source" : "if(ctx._source.parent_id != null){ctx._routing = ctx._source.parent_id; ctx._source.join_field.name=  params.name; ctx._source.join_field.parent = ctx._source.parent_id}",
		"params" : {
			"name": "child_message"
		}
	}
}

Which doesn't work (how do you create a nested object in a script?).

I have also tried this:

{
	"source": {
		"index" : "testm"
	},
	"dest" :{
		"index" : "testmnew"
	},
	"script" : {
		"lang" : "painless",
		"source" : "if(ctx._source.parent_id != null){ctx._routing = ctx._source.parent_id; ctx._source.join_field=  params.cjoin}",
		"params" : {
			"cjoin" :{
				"name": "child_message",
				"parent": 1
			}
				
		}
	}
}

But the problem is that I cannot specify ctx._source.parent_id in the params (it is hard coded to 1 now).
How can I specify an id from the current document in the params?
Is this the correct way of doing the reindex?

Thanks

Immo


Return parents with or without children
(Immo Benjes) #2

My new index now contains these documents:

{
"took": 2,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
        {
            "_index": "testmnew",
            "_type": "message",
            "_id": "2",
            "_score": 1,
            "_source": {
                "subject": "Mesage 2",
                "body": "Body 2"
            }
        },
        {
            "_index": "testmnew",
            "_type": "message",
            "_id": "1",
            "_score": 1,
            "_source": {
                "subject": "Mesage 1",
                "body": "Body 1"
            }
        },
        {
            "_index": "testmnew",
            "_type": "message",
            "_id": "3",
            "_score": 1,
            "_routing": "1",
            "_source": {
                "subject": "Mesage 3 reply to 1",
                "parent_id": 1,
                "body": "Body 3",
                "join_field": {
                    "parent": 1,
                    "name": "child_message"
                }
            }
        }
    ]
}

}

The child document with the ID 3 now has a _routing and join_field.

When I do a search and want to include the child messages I don't get anything:

{
  "query": {
  	"bool": {
  		"must":[ {
  			"match_all": {}	
  		},{
		    "has_child": {
		      "type": "child_message",
		      "query": {
		        "match_all": {}
		      },
		      "inner_hits": {}    
		    }
	    }]
  	}
  }
}

That would indicate something went wrong (or the query is wrong but it worked on an index where I indexed the child messages by inserting them with the correct routing.


(Immo Benjes) #3

Okay I found the problem. You have to set the "join_field" on the parent as well, somehow I thought to remember that you only need that on the child but that seems to be wrong.

{
	"source": {
		"index" : "testm"
	},
	"dest" :{
		"index" : "testmnew2"
	},
	"script" : {
		"lang" : "painless",
		"source" : "if(ctx._source.parent_id != null){ctx._routing = ctx._source.parent_id; ctx._source.join_field=  params.cjoin}else{ctx._source.join_field = params.parent_join}",
		"params" : {
			"cjoin" :{
				"name": "child_message",
				"parent": 1
			},
			"parent_join" : "parent_message"
				
		}
	}
}

Now I only have the problem of how to set the parent ID (use a dynamic filed in the params)


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.