Help Optimizing Terms Aggregation

sudhanne · August 2, 2018, 12:10am

Trying to design a query that will return customer data bucketed into various time ranges. The total number of documents being aggregated on will always be scoped to a given client, which in practice should never exceed a few thousand. It seems that this query takes ~3-4 seconds regardless of what I do to tune it and irrespective of the size of the client.

Based on my research, this delay is coming from building the global ordinals. This is a very high cardinality field. The only thing that speeds up the query is to enable eager_global_ordinals on the invoice_payer_contact_token field, but that causes our CPU / RAM usage to spike like crazy.

# customer aging report (full)
GET /invoices/invoice/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "invoice_unit_token": "KM9TQK5WVWXV3"
          }
        },
        {
          "term": {
            "invoice_state": "UNPAID"
          }
        }        
      ]
    }
  },
  "aggs": {
    "range": {
      "date_range": {
        "field": "invoice_due_on",
        "ranges": [
          {
            "from": "now/d-30d/d",
            "key": "0-30 days"
          },
          {
            "from": "now-60d/d",
            "to": "now/d-31d/d",
            "key": "31-60 days"
          },
          {
            "from": "now/d-90d/d",
            "to": "now/d-61d/d",
            "key": "61-90 days"
          },
          {
            "from": "now/d-120d/d",
            "to": "now/d-91d/d",
            "key": "91-120 days"
          },
          {
            "to": "now/d-91d/d",
            "key": "91 days or older"
          },          
          {
            "to": "now/d",
            "key": "all-time"            
          }                    
        ]
      },
      "aggs": {
        "customers": {
          "terms": {
            "field": "invoice_payer_contact_token",
            "size": 1000,
            "order": {
              "total_requested_amount": "desc"
            }
          },
          "aggs": {
            "total_requested_amount": {
              "sum": {
                "field": "invoice_requested_amount_cents"
              }
            }
          }
        }
      }
    }
  }
}

Mark_Harwood · August 2, 2018, 11:19am

If the number of customer is low you could try set the execution_hint on the customers agg to "map"

sudhanne · August 2, 2018, 2:51pm

I've tried this but unfortunately it does not help. There is almost no difference in query execution times. Customers is an extremely high cardinality field.

Is there a way that I could possibly break this query into multiple parts somehow to be more efficient?

system · August 30, 2018, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow terms aggregations after use of eager_global_ordinals Elasticsearch	6	773	November 9, 2020
Terms aggregation is slow, setting eager_global_ordinals to True did not work Elasticsearch aggregations	2	292	February 22, 2024
Slow Terms Aggregation Elasticsearch	4	1414	May 10, 2019
Terms Aggregation performance high cardinality Elasticsearch	8	5222	July 5, 2017
Performance issue with terms sub-aggregations when using global ordinals Elasticsearch	1	623	January 7, 2020

Help Optimizing Terms Aggregation

Related topics