Delete by query for nested docs for ES version 7.x

Delete-by-query is in general a bit performance intensive in ES.
Would you suggest to use it for nested docs too?
Also, is there some different syntax to delete-by-query when the field being queried is present only in parent docs or only in children docs?
We do not want orphaned children or child-less parents after doing a delete-by-query and want the whole families of docs to live together or be deleted together.

Delete-by-query as far as I know only deletes complete documents. If you want to drop select nested documents that is really an update to the document which causes it and the remaining nested documents to be reindexed.

I do not want to delete select children from a parent.
I want to drop complete families with DBQ.
Example:

    {
       "id": "root-1",
       "age": 70,
       "children": [
           {
               "id": "child-1",
               "children": [{
                  "id": "grandchild-1"
              },
              {
                  "id": "grandchild-2"
              }
              ]
          }
       ]
    },
    {
       "id": "root-2",
       "age": 25,
       "children": []
    },
    {
       "id": "root-3",
       "age": 35,
       "children": [
           {
               "id": "child-3"
          }
       ]
    }

Now, I do not want to delete any child or grandhchildren independently here.
Rather I want to delete whole families by query.
For above example, is it valid to issue a DBQ like:
Delete * from some_table where age > 35
(Assume elastic query instead of SQL. I just gave SQL for simplicity).
Will the above delete query delete parent-1 and all its children and grandchildren without any issues?

If you are using parent-child relationships there is no cascade delete in Elasticsearch. Only the documents matching the search criteria will be deleted, not other documents that are related or in the same hierarchy.

So how do I delete all children along with the parent?
It does not make sense that delete of a parent leaves some orphan children in the index.

There are following solutions to remedy this problem IMO (assuming N children for a parent):

  1. ES automatically prevents children from being orphaned. If you delete a parent, its children are automatically deleted - Best approach IMO
  2. User deletes all children first - then deletes the parent. (Requires N+1 queries)
  3. User sends a doc having just the parent with an empty children array. This "update" operation gets rid of all the parent's children. And then the user deletes the parent in yet another query. (Requires 2 queries)

Can you clarify please which is the recommended solution in the above?

Thanks

Elasticsearch does not work that way. You are assuming that having orphaned children is a bad think that is to be avoided, but I have seen use cases where this is used to be able to create children before the parent is available, so it is not clear that something that can/should be generally enforced.

That will work. Can be done in 2 requests though, one to identify the parent and children and one (possibly bulk) to delete them. I suspect this is what delete by query does behind the scenes.

I do not understand this. Are you mixing up and confusing nested documents with parent-child relationships? If so, this blog post may help clarify things even if the implementation of parent-child has changed a bit.

Use delete by query. All the documents need to be found and deleted which is exactly what it does. I can not think of a more efficient way.

In the JSON snippet I gave above (the one having root-1, root-2, root-3 etc in the snippet), you can see that some of the children do not have the field age.

Can you provide a delete by query for that use-case please where I want to delete all parents whose age is greater than 35 but I also want to ensure that no child is orphaned (my use-case is like this where orphans are not tolerated).
If a single delete-by-query will not work, then how do I go about achieving the above?

So far, what I have gathered is that ES deletes the documents that match the query - period.
A con of this approach for some users is that orphans would be left in ES.
While a pro is that ES gives you the ability to delete "some" children that match the query.

However, if that is true, then an interesting question is to know what happens when you delete a child having children of its own. Example:

    {
       "id": "root-1",
       "age": 70,
       "has_car": false,
       "children": [
           {
               "id": "child-1",
               "has_car": true,
               "children": [{
                  "id": "grandchild-1"
              },
              {
                  "id": "grandchild-2"
              }
              ]
          }
       ]
    }

If my delete-by-query is "has_car": true, then what happens to "grandchild-1" and "grandchild-2" ?
Do they become children of "root-1" ? That would be totally weird I think.

It looks like I may have misunderstood your use case as you are not using the parent-child feature but nested documents. If that is the case you will most likely need to use a scripted update using painless to maintain the structure you are aiming for.

Can you please provide some link using painless ?
We do not want orphaned docs at any point of time and we want to delete whole families of docs using queries on parents' fields only.

@Christian_Dahlqvist - a link would be very helpful

I do not have any examples.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.