Advices on bookmarking docs

We have a lot of docs like this:

{
"_type": "doc",
"_id": "123",
"_source": {
"parent_name": "abc"
}
}

Each doc has only one parent_name but multiple docs can have the same
parent. It is like a many-to-one relationship, but the parent has no other
info apart of its name, so we didn't create a separate doc for them

Now we want to provide our users the option to bookmark parents so he can
later do queries on docs that are children of his bookmarked parents only.
We could easily do that with a terms filter like that:

{
"filter": {
"terms": {
"parent_name": [
"abc",
"def",
"ghi"
]
}
}
}

We could pass to the filter all the user's bookmarked parents names that
are persisted, let's say, in a relational database.

But the problem is that we have more than 50 million docs and the user can
bookmark millions of parents. That would be too heavy to send a filter with
millions of terms in every request. So we need to handle the bookmarks
directly on ElasticSearch.

We considered using a filtered alias, so that we have that very same filter
persisted in the Elastic and we won't have to pass it in every request.
This would be already way better than passing the filter in each request,
but we want more, we want it to be very performatic. Filtering with
millions of terms would be slow, even if we don't need to send the filter
in the request

Now we decided to add in our docs a meta field with information like "who
bookmarked me", somethink like this:

{
"_type": "doc",
"_id": "123",
"_source": {
"parent_name": "abc",
"bookmarked_by": [
"roger",
"john"
]
}
}

Then we can use a term (term, without the "s") filter like this:

{
"filter": {
"term": {
"bookmarked_by": "roger"
}
}
}

That would be (I hope) way more performatic than our last approach, but
still has issues.

The problem we would have now is about updating bookmarks.
When the user bookmarks/un-bookmarks a parent, we can do a query for all
docs with this parent and update their "bookmarked_by" field with the user
identifier. That is ok.
But what happens when we add a new doc with a parent the user bookmarked
before?
We could query for the other docs with the same parent and copy the
bookmarked_by field to the new doc, but that is ugly.

So we concluded we need to have the bookmarked_by field centralized in a
parent doc.

We considered the following approaches:

1 - parent-child relationship

{
"_type": "parent",
"_id": "1",
"_source": {
"bookmarked_by": [
"roger",
"john"
]
}
}

{
"_type": "child",
"_id": "1",
"_parent": "1",
"_source": {}
}
{
"_type": "child",
"_id": "2",
"_parent": "1",
"_source": {}
}

Then, when user "roger" does a query on the children, the query would also
have a has_parent filter like this:

{
"has_parent": {
"parent_type": "parent",
"filter": {
"term": {
"bookmarked_by": "roger"
}
}
}
}

2 - nested type

{
"_type": "parent",
"_id": "1",
"_source": {
"bookmarked_by": [
"roger",
"john"
],
"children": [
{
"id": 1
},
{
"id": 2
}
]
}
}

Then, when user "roger" does a query, we use a nested query to query only
the children with bookmarked parents:

{
"nested": {
"path": "children",
"query": {
<actual_query>
"filter": {
"has_parent": {
"parent_type": "parent",
"filter": {
"term": {
"bookmarked_by": "roger"
}
}
}
}
}
}
}

3 - No actual joins approach

{
"_type": "parent",
"_id": "1",
"_source": {
"name": "abc",
"bookmarked_by": [
"roger",
"john"
]
}
}

{
"_type": "child",
"_id": "1",
"_source": {
"parent_name": "abc",
"bookmarked_by": [
"roger",
"john"
]
}
}

{
"_type": "child",
"_id": "2",
"_source": {
"parent_name": "abc",
"bookmarked_by": [
"roger",
"john"
]
}
}

Then, every time a parent gets updated, we query for all its children
(using the parent_name field) and update their bookmarked_by fields to
reflect the updated parent's bookmarked_by field.
And every time we add a new child doc we query for its parent and copy the
parent's bookmarked_by field to the new doc

The main problem with the first 2 approaches is the need to do join in
runtime. I didn't test them, but I think that joining with millions of docs
could be way slower than not joining at all.
Also, the nested type approach has the issue of returning the parent doc on
queries, and we need to return the matching children only.
The third approach looks to be the more performatic one, but it is almost
as ugly as not having the parent in a separate doc at all.

I may have put some wrong information here, as I didn't test every
approach. I'm only using common knowledge with some guessing, but I hope I
have described our problems well

I would like some advice, maybe I missed a better approach?

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2533kHjGnzQN5ndAGdgOXrW%3DpLTpr8NX6QB-Kq%2BvY20%3DJNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.