Help with improving composite aggregation search speed

Toma_Popov · April 28, 2020, 1:10pm

Hello,

Currently we are tying to aggregate quite a bit of data for one of our dashboards.

For this purpose we use composite aggregations to fetch and aggregated the data.

This is an index, containing data for the last 25 days and has 524m documents.

green open hourly-request-aggregations-000001                upuOUWZ9SZKVQgaFYgYaqA  4 1 524262164       0  80.1gb    40gb

Also there other indices that are being indexed and searched.

And this is our cluster:

0.0.0.0 72 97 3 0.18 0.65 0.84 dilm - node-1
0.0.0.0 68 96 4 0.89 1.22 1.34 dilm - node-2
0.0.0.0 71 96 7 0.52 0.94 1.21 dilm * node-3
0.0.0.0 97 5 0.45 0.78 1.06 dil  - data-node-1

Where each of the nodes has the following specs:

OS: Centos 7
Elastic version: 7.6
RAM: 32GB(16 GB for elastic and 16 for the OS)
CPU: 8 Cores

The machines are hosted on https://www.linode.com/

The query that we use for the aggregation is:

{
  "aggs": {
    "data": {
      "aggs": {
        "total1": {
          "sum": {
            "field": "field1"
          }
        },
          "total2": {
            "sum": {
              "field": "field2"
            }
          },
          "total3": {
            "sum": {
              "field": "field3"
            }
          },
          "total4": {
            "sum": {
              "field": "field4"
            }
          },
          "total5": {
            "sum": {
              "field": "field5"
            }
          }
      },
        "composite": {
          "size": 10000,
          "sources": [
          {
            "date": {
              "date_histogram": {
                "calendar_interval": "1d",
                "field": "@timestamp",
                "time_zone": "+0000"
              }
            }
          },
          {"term1": { "terms": { "field":"numbField" } } },
          ]
        }
    }
  },
    "query": {
      "bool": {
        "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "yyyy-MM-dd HH:mm:ss Z",
              "gte": "2020-04-01 00:00:00 +0000",
              "lte": "2020-04-06 23:59:59 +0000"
            }
          }
        }
        ]
      }
    },
    "size": 0
}

Using this on smaller date ranges like a an hour or a day works ok-ish, the request returns in a few seconds(not ideal but enough).

But going a but further and setting a period of 7 days, takes far longer around 1 minute.

Also currently we have services, that run basically the same queries on the background, aggregate the results and index them into a new index, in order to squish down the amount of data that we have.

I assume the searches from those services slow down the cluster as a how.

Any advice on how to speed up the searches and how to determine slow cluster searches as a whole.

I have read the most popular articles and best practices advice for search optimization tuning.

Any help is welcome.

Thanks in advance.

system · May 26, 2020, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query Optimization Elasticsearch	2	501	November 4, 2020
Aggregation Search Query is slow Elasticsearch	3	494	January 15, 2019
Troubleshooting Slow Aggregation Query Elasticsearch	7	3555	July 5, 2017
Aggregation Sum is very slow Elasticsearch	1	554	October 9, 2018
How do I create a composite aggregate in c# Elasticsearch	2	677	April 15, 2024

Help with improving composite aggregation search speed

Related topics