Extremely slow has_child query. greater than 10 seconds

Hi. Having trouble scaling has_child query

I am trying to search for all items that haven't been viewed by a user yet.

Assumtions:

  • 3 million users
  • 3 million items
  • 10 thousand items viewed per user.

We have an item index with a child document viewed_by for each user that viewed the item.

To query items that have not been seen yet we use

bool:
  must_not: [
    {has_child: {type: 'viewed_by', query: {term: viewed_by_user_id: CURRENT_USER_ID}}}
  ]

This kinda works, its not fast but it doesn't generate errors.

When we tried adding more parameters to the filter, to only exclude items with a particular status, searches stop scaling and start timing out, our timeout is set to 10 seconds.

bool:
  must_not: [
    {has_child: {type: 'viewed_by', query: {bool: {must: [
        {term: viewed_by_user_id: CURRENT_USER_ID}
        {term: status: 1}
    ]}}}}
  ]

During this time the node cpu is under 20% with plenty of available memory. The cluster has 16 servers with 32 cores and 60 gigs of ram. Heap is limited to 30g.

Any suggestions of how to fix or where to look to figure out why the cpu is so low?

Thanks in advance.

Have you tried using eager_global_ordinals?

https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-performance.html

Hi Mike, thanks for the suggestions I have not but I will. What concerns me
though is that a simple has_child works just fine, but a slightly more
complicated one cripples the system. I would imagine that the cost of
building the ordinals list is the same regardless of the query complexity.