Nested aggregation got too slow result

I have an index with a billions document in the future, for now it's around 20mil documents. It took over 10s to get the result while I need a query around 3-4s for billion documents. Is my structure was wrong or need to improve the query or server configuration? Im using amazon elasticsearch service.

This query will return amount/transactions/items of every station in every area
Query:

{
  "size" : 0,
  "query": {
        "bool": {
            "must": 
            [
                {
                    "range": {
                        "date_sec": {
                            "gte": "1483228800",
                            "lte": "1525046400"
                        }
                    }
                },
                {
                    "range": {
                        "time_sec": {
                            "gte": "32400",
                            "lte": "75600"
                        }
                    }
                }   
            ]
        }   
    },
  "aggs": {
    "numstoreamountclient" : {
        "filter" : { "range" : { "amount" : { "gt" : 0 } } },
        "aggs": { 
                "numstore_amountclient": { 
                    "cardinality" : {
                      "field" : "id_station"
                    }
                }
            } 
    },
    "id_station": {
      "terms": {
        "field": "id_station"
      },
      "aggs": {
        "area_type": {
          "terms": {
            "field": "area_type"
          },
          "aggs": {
                "max_time" : { "max" : { "field" : "time_sec" } },
                "min_time" : { "min" : { "field" : "time_sec" } },
                "amountclient": {
                  "sum": {
                    "field": "amount"
                  }
                },
                "itemclient": {
                  "sum": {
                    "field": "items"
                  }
                },
                "transactionclient" : { 
                    "value_count" : 
                    { 
                        "field" : "id" 

                    } 

                },
                "inwatchinghour": {
                  "filter" : { "term" : { "in_watchinghour" :  1 } },
                  "aggs" : {
                      "amountclientwatch": {
                              "sum": {
                                "field": "amount"
                              }
                        },
                        "itemclient": {
                              "sum": {
                                "field": "items"
                              }
                        },
                        "transactionclientwatch" : { 
                            "value_count" : 
                            { 
                                "field" : "id" 
                            } 
                        }
                    }
                },
                "saleclient": {
                  "filter" : { 
                    "bool": {
                        "must": 
                        [
                            {
                                "term" : { "in_watchinghour" :  1 } 
                            },
                            {
                                "range": {
                                    "items": {
                                        "gt": "0"
                                    }
                                }
                            },
                            {
                                "range": {
                                    "amount": {
                                        "gt": "0"
                                    }
                                }
                            }
                        ]
                    }   
                  },
                  "aggs" : {
                      "sale_client" : { 
                            "value_count" : 
                            { 
                                "field" : "id" 
                            } 
                        }
                    }
                }
            }
        }
      }
    }
  }
}

Execute time is: 11s

What instance types is your cluster deployed on? How many nodes do you have in the cluster? What does CPU usage look like on the nodes while you are running this query? How many indices and shards is your data spread across?

Hi Christian,
Im using:

  • m4.large.elasticsearch
  • single node
  • CPU less than 1%
  • only 3 indices but 2 is default and small data
  • Number of nodes 1
  • Number of data nodes 1
  • Active primary shards 11
  • Active shards 11

Im in Thailand and Vietnam, ES server in Singapore

That sounds far too low. Are you sure this captures the CPU usage while the query is running? How often is monitoring polling data?

You can see the last hour I checked with this query, it's very low

Can you zoom in around the time the query ran? If the sample interval is not granular enough, it may be that a spike in CPU usage that only lasts 10 seconds is not registered.

My mistake, this is the last query I just checked.

Can you zoom in further and reduce the period? is this showing the max value?

Hi Christian,
Max value of CPU is 19% at the time I was executing the query, this is max zoom in amazon tool. I can not zoom in anymore, which information that you need more?
19% I think is very high, 5 query like this will kill the server

I am not familiar with Amazon monitoring, so do not know if what you are seeing is accurate or not. This makes it is hard to tell what is limiting performance.

This is another monitoring part(from 23:00)

2018-04-24_2308_0012018-04-24_2308

Did you understand the aggregation above? Is it problem when im using many nested aggs?

JSON document:

{
  "_index": "c980_datacashier",
  "_type": "doc",
  "_id": "35512229",
  "_version": 2,
  "_score": 2,
  "_source": {
    "receiptid": "104-1245104147-42-3",
    "ticket": 3,
    "id_employee": 104147,
    "roomtype": null,
    "id_station": 2209,
    "total_time": null,
    "time_begin": 45900,
    "date_sec": 1509753600,
    "promotion": null,
    "items": 4,
    "in_watchinghour": 0,
    "time_end": 46200,
    "status_check": 0,
    "numberofclients": null,
    "shift": null,
    "@timestamp": "2018-04-23T21:25:55.397Z",
    "area_type": 1,
    "id_clientstation": 104,
    "time_sec": 45900,
    "employee": null,
    "status_2": null,
    "cashier": null,
    "room": null,
    "amount": 157678,
    "id_cashier": 42,
    "status_1": 0,
    "update_time": "2018-04-22T11:47:25.383Z",
    "@version": "1",
    "deleted": 0,
    "id": 35512229
  },
  "fields": {
    "time_begin": [
      "1970-01-01T00:00:45.900Z"
    ],
    "update_time": [
      "2018-04-22T11:47:25.383Z"
    ],
    "@timestamp": [
      "2018-04-23T21:25:55.397Z"
    ],
    "time_end": [
      "1970-01-01T00:00:46.200Z"
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.