Performance Issue with Sub-Aggregations and Nested Document Structure?

techtom · February 26, 2017, 2:33pm

Hi,

We have aggregations with sub-aggregations over several levels and a nested document structure.
Running the aggregations takes up to 9 seconds with 20 million documents in the index.
We are wondering whether 9 seconds is "normal" given the sub-aggregations and the nested structure, or whether we are missing something.

This is what we have:

In our domain, we have orders and order items. An order contains one or more order items.
This is the mapping in Elasticsearch:

{
  "orders": {
    "mappings": {
      "order": {
        "properties": {
          "creationDate": {
            "type": "date"
          },
          "meansOfPaymentType": {
            "type": "keyword"
          },
          "price": {
            "type": "float"
          },
          "orderItem": {
            "type": "nested",
            "properties": {
              "cancelled": {
                "type": "boolean"
              },
              "quantity": {
                "type": "integer"
              },
              "totalPrice": {
                "type": "float"
              }
            }
          }
        }
      }
    }
  }
}

This is the query:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "creationDate": {
              "gte": "2012-01-01T00:00:00+01:00",
              "lt": "2017-02-01T00:00:00+01:00"
            }
          }
        },
        {
          "nested": {
            "path": "orderItem",
            "query": {
              "constant_score": {
                "filter": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "orderItem.cancelled": false
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "groupByInterval": {
      "date_histogram": {
        "field": "creationDate",
        "interval": "month",
        "time_zone": "CET",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2012-01-01T00:00:00",
          "max": "2017-01-31T23:59:59"
        }
      },
      "aggs": {
        "groupByReportType": {
          "terms": {
            "size": 50,
            "field": "meansOfPaymentType"
          },
          "aggs": {
            "sumTotalPrice": {
              "sum": {
                "field": "price"
              }
            },
            "orderItemsGroupedByReportType": {
              "nested": {
                "path": "orderItem"
              },
              "aggs": {
                "filteredOrderItems": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "orderItem.cancelled": false
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "sumTotalOrderItemPrice": {
                      "sum": {
                        "field": "orderItem.totalPrice"
                      }
                    },
                    "sumTotalOrderItemQuantity": {
                      "sum": {
                        "field": "orderItem.quantity"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Operating system: MacOSX El Capitan, 2,7 GHz Intel Core i5, 16 GB 1867 MHz DDR3
JVM-Heap: 8 GB
Number of primary shards: 2
Number of replica shards: 0

Number of orders: 10 000 000 (results in 20 001 980 documents in Elasticsearch because of the nested structure)
First order: 2012-01-01
Last order: 2017-01-31
Number of means of payment types: 4

What we've already tried:

increase/decrease JVM-Heap
increase/decrease number of primary shards
set property "index.store.preload" in elasticsearch.yml
use "eager_global_ordinals" in the mapping
set format of date to "epoch_second" in the mapping
use the official docker image of Elasticsearch

None of these adjustments lead to improved performance.

Varying the aggregations leads to the following times:

omit date histogram: 877ms
omit filter/aggregations on order item level: 1055ms
omit date histogram and filter/aggregations on order item level: 319ms
change interval of date histogram from "month" to "year": 2111ms

Do you have any ideas how to improve performance, or is this the performance one would expect given the aggregations (date-histogram, terms, sum) and the nested document structure?

Thanks!

spinscale · February 27, 2017, 9:52am

Hey,

you could give the Profile API a shot (the API is in Elasticsearch, but the easy-to-use UI is part of x-pack and free to use with a basic license).

https://www.elastic.co/guide/en/x-pack/current/xpack-profiler.html

--Alex

techtom · February 27, 2017, 6:59pm

Hey Alex,

thanks for your reply!
I will definitely try it out.

system · March 27, 2017, 6:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nested object aggregation performance issues Elasticsearch	1	531	March 8, 2021
Nested aggregation got too slow result Elasticsearch	12	562	May 22, 2018
Aggregation on nested document always takes 2-3 seconds? Elasticsearch	2	527	July 6, 2017
Strange performance of terms aggregation on nested documents Elasticsearch	1	498	February 4, 2020
Extremely slow nested aggregations, need suggestions on modeling/shards Elasticsearch	1	482	July 5, 2017

Performance Issue with Sub-Aggregations and Nested Document Structure?

Related topics