Hi,
We have aggregations with sub-aggregations over several levels and a nested document structure.
Running the aggregations takes up to 9 seconds with 20 million documents in the index.
We are wondering whether 9 seconds is "normal" given the sub-aggregations and the nested structure, or whether we are missing something.
This is what we have:
In our domain, we have orders and order items. An order contains one or more order items.
This is the mapping in Elasticsearch:
{
"orders": {
"mappings": {
"order": {
"properties": {
"creationDate": {
"type": "date"
},
"meansOfPaymentType": {
"type": "keyword"
},
"price": {
"type": "float"
},
"orderItem": {
"type": "nested",
"properties": {
"cancelled": {
"type": "boolean"
},
"quantity": {
"type": "integer"
},
"totalPrice": {
"type": "float"
}
}
}
}
}
}
}
}
This is the query:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"creationDate": {
"gte": "2012-01-01T00:00:00+01:00",
"lt": "2017-02-01T00:00:00+01:00"
}
}
},
{
"nested": {
"path": "orderItem",
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"term": {
"orderItem.cancelled": false
}
}
]
}
}
}
}
}
}
]
}
},
"aggs": {
"groupByInterval": {
"date_histogram": {
"field": "creationDate",
"interval": "month",
"time_zone": "CET",
"min_doc_count": 0,
"extended_bounds": {
"min": "2012-01-01T00:00:00",
"max": "2017-01-31T23:59:59"
}
},
"aggs": {
"groupByReportType": {
"terms": {
"size": 50,
"field": "meansOfPaymentType"
},
"aggs": {
"sumTotalPrice": {
"sum": {
"field": "price"
}
},
"orderItemsGroupedByReportType": {
"nested": {
"path": "orderItem"
},
"aggs": {
"filteredOrderItems": {
"filter": {
"bool": {
"filter": [
{
"term": {
"orderItem.cancelled": false
}
}
]
}
},
"aggs": {
"sumTotalOrderItemPrice": {
"sum": {
"field": "orderItem.totalPrice"
}
},
"sumTotalOrderItemQuantity": {
"sum": {
"field": "orderItem.quantity"
}
}
}
}
}
}
}
}
}
}
}
}
Operating system: MacOSX El Capitan, 2,7 GHz Intel Core i5, 16 GB 1867 MHz DDR3
JVM-Heap: 8 GB
Number of primary shards: 2
Number of replica shards: 0
Number of orders: 10 000 000 (results in 20 001 980 documents in Elasticsearch because of the nested structure)
First order: 2012-01-01
Last order: 2017-01-31
Number of means of payment types: 4
What we've already tried:
- increase/decrease JVM-Heap
- increase/decrease number of primary shards
- set property "index.store.preload" in elasticsearch.yml
- use "eager_global_ordinals" in the mapping
- set format of date to "epoch_second" in the mapping
- use the official docker image of Elasticsearch
None of these adjustments lead to improved performance.
Varying the aggregations leads to the following times:
- omit date histogram: 877ms
- omit filter/aggregations on order item level: 1055ms
- omit date histogram and filter/aggregations on order item level: 319ms
- change interval of date histogram from "month" to "year": 2111ms
Do you have any ideas how to improve performance, or is this the performance one would expect given the aggregations (date-histogram, terms, sum) and the nested document structure?
Thanks!