Change parent for children

Hi !

i have the index with two types : Group and Person, where Group is a parent and Person is a child. The persons can change the groups, for example today into Geroup1 i have Person1, Person2 and Person3, and tomorrow i will have only Person1 and Person4.
So my solution was to delete the parents every day and after that reinsert them + change routing for children (in my case every person is member a group).
The question is - is it possible and how i can do it? why if i delete the parent the child can't be "adopted" by someone else ?

Hello,

Yes, you can do as follows (which I think is the order of operations you described):

PUT test_join
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "joint": {
        "type": "join",
        "relations": {
          "group": "person"
        }
      }
    }
  }
}

# index a Group
PUT test_join/_doc/1?refresh
{
  "text": "First day group",
  "joint": "group" 
}
# index a Person belonging to the Group
PUT test_join/_doc/2?routing=1&refresh
{
  "text": "Person who will move",
  "joint": {
    "name": "person", 
    "parent": "1" 
  }
}
# Delete the group
DELETE test_join/_doc/1
# Create the new Group
PUT test_join/_doc/3?refresh
{
  "text": "Second day group",
  "joint": "group" 
}
# Reindex the Person (update would work, too)
PUT test_join/_doc/2?routing=3&refresh
{
  "text": "Person who will move",
  "joint": {
    "name": "person", 
    "parent": "3" 
  }
}

Now let's do some basic validation:

# check the Person doc by GET
GET test_join/_doc/2
{
  "_index" : "test_join",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 2,
  "_seq_no" : 4,
  "_primary_term" : 1,
  "_routing" : "3",
  "found" : true,
  "_source" : {
    "text" : "Person who will move",
    "joint" : {
      "name" : "person",
      "parent" : "3"
    }
  }
}
# check the join by has_parent
GET /test_join/_search
{
  "query": {
    "has_parent": {
      "parent_type": "group",
      "query": {
        "match": {
          "text": "Second"
        }
      }
    }
  }
}
{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_join",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_routing" : "3",
        "_source" : {
          "text" : "Person who will move",
          "joint" : {
            "name" : "person",
            "parent" : "3"
          }
        }
      }
    ]
  }
}
# check the join by has_child
GET /test_join/_search
{
  "query": {
    "has_child": {
      "type": "person",
      "query": {
        "match": {
          "text": "Person"
        }
      }
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_join",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "text" : "Second day group",
          "joint" : "group"
        }
      }
    ]
  }
}

These results are as intended by the operations.

1 Like

Be aware that if you try this and have more than one primary shard in the index you may end up with orphaned documents as the parent ID is used to rout the related documents to the same shard which may change when you change parent. You may therefore need to delete the child before recreating it.

1 Like

Wow, good catch!

Indeed, I was able to easily reproduce that scenario.

Create the original Group and Person, they both go to shard A.

Delete the original Group, create a new one, it goes to shard B.

Re-index the Person document, it goes to shard B, following the routing of the named parent.

Now match_all returns both Persons. _cat/shards confirms the 1 doc on shard A and 2 docs on shard B.

Sorry for missing that.

yes, normally you have more than 1 shard, because it is performance staff. so one solution its delete the child document and reinsert it one more time. so routing can be changed only with reinsertion.
We use ES 5.5... so we can use several type in the index, and we have 5 shrds as well. i tried the similar solution yesterday, and i had doubles docs... so change 20 000 000 person every day its a little bit loooooong =)
So i have another idea : if i will have Person as Parent and group as Child ? yes it will be 1 to 1 relation and a lot of children will be the same, but it can be usefull for me.
So i have the next question :
I need the first 10 distinct group ids, which have to be ordered by person's field (for example date of birth)

|person|group|
|1|1|
|2|2|
|3|2|
|4|10|
|5|4|
|6|5|
|7|6|
|8|7|
|9|1|
|10|8|
|11|9|
|12|1|
|13|11|
|14|3|
|15|11|

for this example i should have 1 2 10 4 5 6 7 8 9 11 ids in this order
Do you have any idea about how i can do it or even can i do it ?

If I understand your proposal correctly, you are saying that, for example, if you have three people belonging to a group, the way you would model that would be to have three distinct pairs of documents, a Group and a Person, where the 3 Group documents are duplicates, with the Group being a child of the Person.

If this is the case, why not just keep Group fields in the Person document. That's far simpler and faster to query (and will result in fewer obstacles to upgrading, which would be a very good idea).

For your query, since what you want is distinct group_ids, you might perform a terms aggregation on the group_id field. Since you want them in order by some person property, you might include a sub-agg and order by that. (The terms agg documentation explains ordering the buckets by values from metrics sub-aggregations.)
For example, to sort the group_id buckets by the date_of_birth field, you would have a Stats sub agg named dob.

Something like this in v7.4, you may need to sort out the difference for v5.5.

POST person_group/_search
{
  "size": 0, 
  "aggs": {
    "group_id_buckets": {
      "terms": {
        "field": "group_id",
        "size": 10,
        "order": {"dob.min": "asc"}
      },
      "aggs": {
        "dob": {
          "stats": {
            "field": "date_of_birth"
          }
        }
      }
    }
  }
}
2 Likes

yes, i already tried today some aggregation, its look good for me, but I didn't see stat aggregation... ok, I will check monday.
for keep the group Id in person type, yes, I thought about it, but for our use case we need separate these documents.
I had another problem with aggregation and filters! I tried query string filter for person type and I had 11 results, but when I applied the same query in HasParent query I had only 8 and in other order... do you have any idea why? I tried to filter by person name

field date_of_birth is person type and not in group type...

so, now my query is like

{
  "from": 0,
  "query": {
    "has_parent": {
      "parent_type": "persontype",
      "score": true,
      "query": {
        "function_score": {
          "query": {
            "common": {
              "name": {
                "query": "paulina cruz",
                "cutoff_frequency": 0.001,
                "low_freq_operator": "and"
              }
            }
          },
          "script_score": {
            "script": "_score * doc['dateOfBirth'].value"
          }
        }
      },
      "inner_hits": {
        "highlight": {
          "fields": {
            "name": {},
            "dateOfBirth": {}
          }
        }
      }
    }
  },
  "aggs": {
    "unique_title": {
      "terms": {
        "field": "documentId",
        "include": {
          "partition": 1,
          "num_partitions": 3
        },
        "order": {
          "avg_score": "desc"
        },
        "size": 5
      },
      "aggs": {
        "avg_score": {
          "max": {
            "script": "_score"
          }
        },
        "bucket_count": {
          "cardinality": {
            "field": "documentId"
          }
        }
      }
    },
    "distinct_terms": {
      "cardinality": {
        "field": "documentId"
      }
    }
  },
  "_source": {
    "excludes": [
      "*"
    ]
  }
}

i have some questions :

  1. how i can sort child document by parent's few field for example i want sort by score firstly and after that by date of birth. If we have to use score_script, i suppose it's not possible ?
  2. how i can implement the pagination in the aggregations ? I tried to use include, but i cant calculate the num_partitions.