Extremely slow has_child query. greater than 10 seconds

emptyemail · August 11, 2015, 7:36pm

Hi. Having trouble scaling has_child query

I am trying to search for all items that haven't been viewed by a user yet.

Assumtions:

3 million users
3 million items
10 thousand items viewed per user.

We have an item index with a child document viewed_by for each user that viewed the item.

To query items that have not been seen yet we use

bool:
  must_not: [
    {has_child: {type: 'viewed_by', query: {term: viewed_by_user_id: CURRENT_USER_ID}}}
  ]

This kinda works, its not fast but it doesn't generate errors.

When we tried adding more parameters to the filter, to only exclude items with a particular status, searches stop scaling and start timing out, our timeout is set to 10 seconds.

bool:
  must_not: [
    {has_child: {type: 'viewed_by', query: {bool: {must: [
        {term: viewed_by_user_id: CURRENT_USER_ID}
        {term: status: 1}
    ]}}}}
  ]

During this time the node cpu is under 20% with plenty of available memory. The cluster has 16 servers with 32 cores and 60 gigs of ram. Heap is limited to 30g.

Any suggestions of how to fix or where to look to figure out why the cpu is so low?

Thanks in advance.

msimos · August 12, 2015, 9:22pm

Have you tried using eager_global_ordinals?

https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-performance.html

emptyemail · August 12, 2015, 10:12pm

Hi Mike, thanks for the suggestions I have not but I will. What concerns me
though is that a simple has_child works just fine, but a slightly more
complicated one cripples the system. I would imagine that the cost of
building the ordinals list is the same regardless of the query complexity.

Topic		Replies	Views
Has_child query slow due to global ordinals - either at refresh or query time, looking for workaround Elasticsearch	5	793	January 4, 2017
Has_child / has_parent for billions of children - heavy cpu load for simple queries? Elasticsearch	5	1295	July 5, 2017
Has_child query performance Elasticsearch	14	2578	July 5, 2017
Very slow has_child query for large index Elasticsearch	15	1511	July 6, 2017
Performance - filter order and has_child filter Elasticsearch	2	413	July 5, 2017

Extremely slow has_child query. greater than 10 seconds

Related topics