Read performance issues for index using parent child relationship

The index uses parent child relationship. On an average a parent can have 2 children. Worst case is 10 children.All fields are indexed.

  1. Whenever we have a query that uses "has_child" or "has_parent" the
    query takes minimum 12s to run. How can we improve the query performance?Please find below a sample query to fetch all the children related to a particular parent:
GET <index-name>/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "has_parent": {
            "query": {
              "bool": {
                "filter": [
                  {
                    "terms": {
                      "parentId":  ["12347813"]
                    }
                  }
                ]
              }
            },
            "parent_type": "item", 
            "inner_hits": {
            }
          }
        }
      ]
    }
  }
}
  1. Interestingly enough, the following query returns in 1s. Like all fields, childId is indexed in the children docs.
GET <index-name>/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "childId":"zyjdjs"
          }
        },
        {
          "has_parent": {
            "query": {
              "bool": {
                "filter": [
                  {
                    "terms": {
                      "parentId":  ["12347813"]
                    }
                  }
                ]
              }
            },
            "parent_type": "item", 
            "inner_hits": {
            }
          }
        }
      ]
    }
  }
}


  1. It looked like specifying children filtering criteria increased the performance but this only works when the specified filter column has high cardinality. For something like a boolean field, where cardinality is only 2, its again worse. The following query takes 17s
GET <index-name>/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "parentId":"zyjdjs"
          }
        },
        {
          "has_parent": {
            "query": {
              "bool": {
                "filter": [
                  {
                    "terms": {
                      "condition":  "false"
                    }
                  }
                ]
              }
            },
            "parent_type": "item", 
            "inner_hits": {
            }
          }
        }
      ]
    }
  }
}

Looks like its going through all the children with condition=true instead of looping through only children associated with a parent. When we use profile=true, the match count was in millions.

  1. We have tried using has_child as well, and there the performance is worse.

  2. Is there any way to verify that parent child relationship is working fine for this index? Anything in profile/query plan that can give us a clue about this?

  3. Can we use nested documents along with parent child mapping?

  4. Are then any other ways to debug apart from explain and profile?

Parent-child queries are expensive. Does your use case absolutely require parent-child relationships, or could you flatten out the mappings and perform filter queries on indexed IDs of linked documents? That would give you the best performance.

Think something along the lines of:

{ 
  "id": "abc",
  "parent_id": "def",
  "children": [ "ghi", "jkl" ]
}

Depending on your use case, nested could be an option to consider, but without knowing more about your use case it's hard to make any recommendations as there are also tradeoffs with using nested structure.

@Kathleen_DeRusso the use case needs parent child relationship as we don't want to store parent level data repeatedly. Here parent docs are heavy and child docs are very light.

What I am not able to understand is that why is has_parent query looping over all the children - it should search children only associated with the parent. Can you please help me understand this?

It's doing a join under the hood, which is why it's very slow - we issue that warning in our documentation which is why my first suggestion was to flatten your index structure if possible to get the best performance. What's your specific use case? Would nested without parent-child join fields work?

Can you use a parent_id query to filter for just the children of the parent?

Yes, as mentioned in (3) the performance is still 17s if we use parent_id.
Even in (1) where there are no filters in child the performance is 12s.
Only if we add a child filter with high cardinality we get results in 1s.

Sorry, I cannot elaborate on the use case as its confidential. Although I can confirm that nested will not work

Even if you can not describe the use case in detail you can probably describe more general characteristics and elaborate on why nested documents will not work without divulging details about the type of data stored.

If you have a large and potentially complex document and some associated data, the main reasons to go for a parent child relationship is in my experience one of the following two:

  1. There is a high number of child documents which will dramatically grow the size if the parent if added as nested. The document would also get reindexed every time a child was added, which could be expensive.
  2. There are few child documents (seems to be your scenario) but these are frequently updated. If stored as nested documents with the parent, each update would cause the full document to be reindexed every time, which could be expensive.

If none of these apply I would recommend to try changing to nested documents and see how this affects performance.

If may also be useful to know a bit more about the environment. What is the size of the index? What is the shard size in terms of documents and size in GB? What is the size and specification of the cluster?

Ok let me try to describe why cant we use nested.

In nested docs, a change in the nested part(child) will need a rewrite to the whole nested part of the document. In our case the writers are different for different fields of the nested document. So if we go with nested approach, this might cause lost update problem. One writer might overwrite another writer's update. Hence we went ahead with parent child, as this give us an opportunity to keep having a single writer for each field.

Coming back to the main problem. We have already populated a huge index and repopulating it would be a bigger change for us. So before doing that we are trying to understand why parent-child is not working as we understand it. So we would like to verify our understanding.

If I am using has_parent to get children for a particular parent, ES would get the parent doc and find the assoicated child docs and return it. Is this understanding correct? If this is correct why first query takes 12 seconds.

Also another question is why the query response becomes faster when we add a child filter that is indexed.(second query)

And why it again becomes slow when we change the child filter to another indexed field that has less number of unique values across all rows.(Query 3)

How large is the index in terms of parent documents and size on disk in GB? How many primary shards do you have?

What is the specification of the cluster in terms of CPU cores, RAM, heap size and type of storage used?

If you have multiple parts of your application writing and updating data in the index I would probably recommend exposing this as an API instead of spreading the update logic out. This would allow you to control updates better and might allow you to move to nested documents which does not necessarily have as high overhead as the parent-child relationship.

  • Index size is around 3.6TB (excluding replicas) and we have two replicas.
    Hence total size is 11TB. Total doc count is 3B
  • We have 42 shards
  • 42 nodes 16 core each
  • Heap is 55% utilized on an average

Christian as mentioned earlier - Before making any big design change we would like to understand why is parent child relationship not working.
We would really appreciate if you could help us answer the below questions -

  1. If I am using has_parent to get children for a particular parent, ES would get the parent doc and find the assoicated child docs and return it. Is this understanding correct? If this is correct why first query takes 12 seconds.

  2. Also another question is why the query response becomes faster when we add a child filter that is indexed.(second query)

  3. And why it again becomes slow when we change the child filter to another indexed field that has less number of unique values across all rows.(Query 3)

All the queries mentioned are explained in detail in the question

I generally do not use parent-child very often so although I know it adds a lot of overhead at query time and is a lot slower (which is why I avoid it as far as I can) I am not the right person to discuss internals. Based on previous experience I would not say that what you are seeing necessarily is unexpected, although the latencies sound very large in general.

If I calculate correctly it seems you have an average shard size of 85GB, which is quite large. I believe a lot of improvements have been made in Elasticsearch 8 that allow queries to execute in parallel across segments but in older versions each query is executed in a single thread per shard. If you are not on the latest version I would recommend upgrading and see what impact that has on query performance. If this is not possible for some reason I would recommend splitting the index into a larger number of primary shards in order to increase the level of parallelism as you have a large number of CPU cores per node. I would probably start by splitting by a factor of 4 to see what impact that has. As you potentially have more data per node than can fit in the operating system page cache (~3*85GB) I would also recommend checking I/O stats and await to see if storage performance might be affecting performance.

1 Like