Slow terms aggregations

artursmet · August 17, 2015, 11:37am

Hi,

I have a problem with my ES deployment. My queries are using a few terms aggregations at once, and I have extremely low performance of queries (~8s for one query).

Cluster:

Two machines (each with two CPU cores and 13GB of memory)
ES has 6 GB of memory available
SSD storage
~9M documents

My aggregations are basically facets for products catalog, so I have high cardinality fields (brands, categories, etc).
I tried several configuration improvements (reducing number of shards, increasing memory, tuning cache) but I didn't manage to fix my issue.

When I perform a query with facets, both machines are using their CPUs at 100%. Memory usage and I/O load are low too.

My questions:

How to improve queries performance? I don't want to reduce number of used facets
Which approach is better for facetting - store unique IDs in documents or whole names (for example for categories)

Regards,
Artur

warkolm · August 17, 2015, 11:37am

Just to check, are you using facets or aggs?

artursmet · August 17, 2015, 11:39am

I'm using aggs.

Example aggregations from my query:

      "_filter_brand": {
     "aggs": {
        "brand": {
           "terms": {
              "field": "brand"
           }
        }
     },
     "filter": {
        "match_all": {}
     }
  },
  "_filter_categories": {
     "aggs": {
        "categories": {
           "terms": {
              "field": "category_ids",
              "size": 100
           }
        }
     },
     "filter": {
        "match_all": {}
     }
  }

Is there any way, to debug my queries? Explain returns only information why some results appeared as results.
I need information about how long it takes to for example fetch results from another node/shard etc.

msimos · August 17, 2015, 4:54pm

You can use the slow log to see the times for each phase:

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#search-slow-log

It won't tell you why, but you can see how many shards were searched and how long it took.

artursmet · August 18, 2015, 6:57am

My query has large result set, so always all shards are involved. I tried to reduce number of shards/replicas and now I have 4 shards without any replicas (2 shards per node). My query time reduced to 7s but it's still unacceptable time for a search query for me.
I tried to split up my query into two queries (one for aggregations with count API and another with standard query) but always first query is very slow.

Any ideas, what should I do to increase the performance?

colings86 · August 18, 2015, 10:15am

It's hard to know what the problem could be without seeing your full query (if it's large please paste into a gist rather than pasting it here and provide a link to it). Also, I would try removing each aggregation one at a time and see if the performance improve to determine which aggregation(s) is causing the slow performance

artursmet · August 18, 2015, 11:07am

@colings86 here is my query: https://gist.github.com/artursmet/4c4369cdcf40923fc3b2#file-slow_query-json

I tried to remove my aggregations one by one and I saw that my author_ids and category_ids aggregations are the slowest.
They are arrays of integers (ids from another database).
I tried to hash them at index time and there is a little improvement, but still it's not so fast.

As I saw, the queries are using mostly CPU (both nodes have 100% cpu used, when my query is running)

colings86 · August 18, 2015, 11:30am

Why are you wrapping every aggregation in a filter aggregation which is a match_all? this is not going to make any difference to the result and only adds overhead to execution.

How many unique author_ids and category_ids are in the index?

artursmet · August 18, 2015, 12:21pm

This query was generated by elasicsearch-dsl-py library, but sure, it can be reduced.

As I mentioned before, I have a lot of unique author_ids and category_ids there.
Category ids (4200 unique), author_ids (over 2M).
How is the best practice for handling terms aggregations when field has many unique values?

jpountz · August 18, 2015, 12:45pm

4200 unique category ids is actually a low cardinality, it should execute very quickly. Since the query takes a long time to execute, could you try to use the nodes hot threads API to capture where elasticsearch is spending time while the query is running? Also it would be interesting to know if your fields are mapped as numerics or strings.

Even if I would be surprised if the filters explained the slowness, they might contribute a bit to it so it would be interesting to test without them.

artursmet · August 18, 2015, 1:13pm

I changed the query and removed filters part, but still there is no visible improvement.
Current query: https://gist.github.com/artursmet/273fa075d97711ea3edd
Categories and authors are mapped as integers:

"author_ids" : {
  "type" : "integer"
}

Brand field is mapped as not_analyzed string.

Hot threads looks interesting, I've never heard of it before. Here is the output from hot threads: https://gist.github.com/artursmet/c97170918f917dbe92f1 but I think that it won't tell me more than I know now. Aggregations are taking most of time.

Things like categories or authors, should be indexed as multi value field (string or numeric) or as nested field with some more data?

jpountz · August 21, 2015, 3:56pm

The hot threads look sane, I'm a bit puzzled when you are having such slow response times on 9M docs only. If you have some time for experimenting, you might want to try to enable doc values on those fields and/or to map them as strings instead of numerics (which have different execution paths for terms aggs).

artursmet · August 21, 2015, 4:01pm

@jpountz Thanks for reply.
I tried setting those fields for doc_values, one thing that I didn't check is mapping as strings instead of numeric. But I will check this path for sure.
I found a little work around - I split my query into two queries (facets for search_type=count endpoint and normal hits into standard _search. I also added timeout for my facet query, because my aggregation counters doesn't have to be super accurate.

warkolm · August 21, 2015, 10:39pm

FYI you should really call them aggs, not facets, otherwise you will confuse people

Topic		Replies	Views
Slow searches on a cluster Elasticsearch	3	877	July 5, 2017
Slow aggregation queries, only after data change (ES 2.3) Elasticsearch	9	1321	December 26, 2016
Filters and facets and tuning Elasticsearch	2	448	July 6, 2017
Very slow aggregation performance for trivial aggs Elasticsearch	5	8482	July 6, 2017
Slow Terms Aggregation Elasticsearch	4	1414	May 10, 2019

Slow terms aggregations

Related topics