Hi,
We have aggregations with sub-aggregations over several levels and a nested document structure.
Running the aggregations takes up to 9 seconds with 20 million documents in the index.
We are wondering whether 9 seconds is "normal" given the sub-aggregations and the nested structure, or whether we are missing something.
This is what we have:
In our domain, we have orders and order items. An order contains one or more order items.
This is the mapping in Elasticsearch:
{
  "orders": {
    "mappings": {
      "order": {
        "properties": {
          "creationDate": {
            "type": "date"
          },
          "meansOfPaymentType": {
            "type": "keyword"
          },
          "price": {
            "type": "float"
          },
          "orderItem": {
            "type": "nested",
            "properties": {
              "cancelled": {
                "type": "boolean"
              },
              "quantity": {
                "type": "integer"
              },
              "totalPrice": {
                "type": "float"
              }
            }
          }
        }
      }
    }
  }
}
This is the query:
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "creationDate": {
              "gte": "2012-01-01T00:00:00+01:00",
              "lt": "2017-02-01T00:00:00+01:00"
            }
          }
        },
        {
          "nested": {
            "path": "orderItem",
            "query": {
              "constant_score": {
                "filter": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "orderItem.cancelled": false
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "groupByInterval": {
      "date_histogram": {
        "field": "creationDate",
        "interval": "month",
        "time_zone": "CET",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2012-01-01T00:00:00",
          "max": "2017-01-31T23:59:59"
        }
      },
      "aggs": {
        "groupByReportType": {
          "terms": {
            "size": 50,
            "field": "meansOfPaymentType"
          },
          "aggs": {
            "sumTotalPrice": {
              "sum": {
                "field": "price"
              }
            },
            "orderItemsGroupedByReportType": {
              "nested": {
                "path": "orderItem"
              },
              "aggs": {
                "filteredOrderItems": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "orderItem.cancelled": false
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "sumTotalOrderItemPrice": {
                      "sum": {
                        "field": "orderItem.totalPrice"
                      }
                    },
                    "sumTotalOrderItemQuantity": {
                      "sum": {
                        "field": "orderItem.quantity"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Operating system: MacOSX El Capitan, 2,7 GHz Intel Core i5, 16 GB 1867 MHz DDR3
JVM-Heap: 8 GB
Number of primary shards: 2
Number of replica shards: 0
Number of orders: 10 000 000 (results in 20 001 980 documents in Elasticsearch because of the nested structure)
First order: 2012-01-01
Last order: 2017-01-31
Number of means of payment types: 4
What we've already tried:
- increase/decrease JVM-Heap
 - increase/decrease number of primary shards
 - set property "index.store.preload" in elasticsearch.yml
 - use "eager_global_ordinals" in the mapping
 - set format of date to "epoch_second" in the mapping
 - use the official docker image of Elasticsearch
 
None of these adjustments lead to improved performance.
Varying the aggregations leads to the following times:
- omit date histogram: 877ms
 - omit filter/aggregations on order item level: 1055ms
 - omit date histogram and filter/aggregations on order item level: 319ms
 - change interval of date histogram from "month" to "year": 2111ms
 
Do you have any ideas how to improve performance, or is this the performance one would expect given the aggregations (date-histogram, terms, sum) and the nested document structure?
Thanks!